library(tidyverse)
library(ggplot2)
#library(summarytools)
library(jtools)
library(naniar)
library(fastDummies)
library(knitr)
library(huxtable)
In this lab we will try to use the csiw dataset but with different variable to understand how this analysis is to be done.
Let us load the required libraries first
Now we can load the dataset
csiw <- read.csv("~/Downloads/csiw07.csv")
Let us see how this dataset looks
summary(csiw, plain.ascii = FALSE)
## school teacher id group
## Min. : 1.000 Min. : 1.00 Min. : 1.00 Min. :-9.000
## 1st Qu.: 1.000 1st Qu.: 4.00 1st Qu.:20.00 1st Qu.: 1.000
## Median : 2.000 Median : 8.00 Median :35.00 Median : 2.000
## Mean : 4.509 Mean :10.91 Mean :32.12 Mean : 1.514
## 3rd Qu.: 9.000 3rd Qu.:18.00 3rd Qu.:44.00 3rd Qu.: 4.000
## Max. :13.000 Max. :55.00 Max. :61.00 Max. : 4.000
##
## treatmt cch1 cch2 ccpt1
## Min. :1.000 Min. :-9.0000 Min. :-9.0000 Min. :-9.000
## 1st Qu.:1.000 1st Qu.: 0.0000 1st Qu.: 0.5000 1st Qu.: 0.000
## Median :1.000 Median : 1.0000 Median : 1.0000 Median : 5.000
## Mean :1.302 Mean :-0.8305 Mean :-0.4619 Mean : 3.101
## 3rd Qu.:2.000 3rd Qu.: 2.0000 3rd Qu.: 2.0000 3rd Qu.: 8.000
## Max. :2.000 Max. : 3.0000 Max. : 3.0000 Max. :16.000
##
## ccpt2 ccprod1 ccprod2 ccrdr1
## Min. :-9.000 Min. :-9.00 Min. :-9.000 Min. :-9.000
## 1st Qu.: 3.000 1st Qu.: 0.00 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 7.000 Median : 2.00 Median : 4.000 Median : 0.000
## Mean : 5.098 Mean : 1.41 Mean : 2.143 Mean :-1.052
## 3rd Qu.:11.000 3rd Qu.: 5.00 3rd Qu.: 6.000 3rd Qu.: 1.000
## Max. :15.000 Max. :17.00 Max. :19.000 Max. : 8.000
##
## ccrdr2 grade hotistdum
## Min. :-9.0000 Min. :-9.0000 Min. :0.0000
## 1st Qu.: 0.0000 1st Qu.: 1.0000 1st Qu.:0.0000
## Median : 0.0000 Median : 1.0000 Median :1.0000
## Mean : 0.1646 Mean : 0.4423 Mean :0.6196
## 3rd Qu.: 2.0000 3rd Qu.: 2.0000 3rd Qu.:1.0000
## Max. :12.0000 Max. : 2.0000 Max. :1.0000
## NA's :81
Keeping in line with the assignment structure we will choose the following variables
treatmtgroupccrdr1ccrdr2gradeNow let us subset this information into a dataset and analyze
csiw <- csiw %>% dplyr::mutate(treat=treatmt, pretest = ccrdr1, posttest=ccrdr2, grade =grade)
csiw <- csiw %>% dplyr::select(treat, pretest, posttest, grade, group)
head(csiw)
| treat | pretest | posttest | grade | group |
| 1 | 1 | -9 | 1 | 3 |
| 1 | -9 | -9 | 1 | 1 |
| 1 | 0 | 6 | 1 | 1 |
| 1 | 3 | 0 | 1 | 1 |
| 1 | 0 | 0 | 1 | 2 |
| 1 | 0 | -9 | 1 | 3 |
We have two recoding tasks ahead of us.
Recoding achievement levels in the group variable. But this is essentially creating dummy variables which is a little bit different than simple recoding. For creation of dummies we will use a convenient R package called fastDummies.
Recoding grade level in the grade variable. For this we will use the recode function in dplyr package.
How does the achievement level variable actuall look
table(csiw$group)
##
## -9 1 2 3 4
## 40 84 82 76 125
Other than the four categories of relevant levels, we also have a -9 for the missing data. While there are many ways to deal with missing data such as mean/median/mode substitution or multiple imputation techniques among others. In our analysis, we will simply drop the data under the assumption of Missing at Random. It simple means that once we control for the relevant confounders, the data can be assumed to be missing at random and hence dropping those observations will not create bias in our estimates.
Therefore, we will replace the -9 with NAs. We will be using a powerful package called naniar to deal with missing values.
#install.packages("naniar")
library(naniar)
csiw <- csiw %>% replace_with_na(replace = list(group = -9))
We can also do this for the remaning variables in the dataset
table(csiw$grade)
##
## -9 1 2
## 40 194 173
csiw <- csiw %>% replace_with_na(replace=list(pretest=-9,posttest=-9, grade=-9))
Now that we have dealt with the missing data, let us move onto the recoding process for group and grade. But remember…
Variable to be recoded has to be a Factor Variable.
Are our variables of interest in the factor type?
class(csiw$treat)
## [1] "integer"
class(csiw$group)
## [1] "integer"
class(csiw$grade)
## [1] "integer"
table(csiw$treat)
##
## 1 2
## 284 123
table(csiw$group)
##
## 1 2 3 4
## 84 82 76 125
table(csiw$grade)
##
## 1 2
## 194 173
So let us convert them to factors first Good practice to convert categorical variables as factors treatment needs to be considered numeric when running regression, but other covariates can remain as factors
csiw$treat <- as.factor(csiw$treat)
table(csiw$treat)
##
## 1 2
## 284 123
# convert into an ordered factor variable into unordered
csiw$treat <- factor(as.numeric(as.character(csiw$treat)), ordered=FALSE)
table(csiw$treat)
##
## 1 2
## 284 123
csiw$group <- as.factor(csiw$group)
csiw$grade <- as.factor(csiw$grade)
Since we need treat vaiable in the 1 or 0 form we will recode this variable
csiw$treat <- recode(csiw$treat, "1"="1","2"="0")
csiw$treat <- as.character(csiw$treat)
csiw$treat <- as.numeric(csiw$treat)
table(csiw$treat)
##
## 0 1
## 123 284
For the group variable we need to create dummies as below
# lets us first recode the entries in the group column for ease of use once we create dummies.
# We will see why....
colnames(csiw)
## [1] "treat" "pretest" "posttest" "grade" "group"
csiw$group <- recode(csiw$group, "1"="high","2"= "average", "3"="low", "4"="learndis")
table(csiw$group)
##
## high average low learndis
## 84 82 76 125
Now we can create the required dummies
csiw <- dummy_cols(csiw,select_columns = "group")
colnames(csiw)
## [1] "treat" "pretest" "posttest" "grade"
## [5] "group" "group_low" "group_high" "group_average"
## [9] "group_NA" "group_learndis"
Similarly, let us now work with the grade variable
table(csiw$grade)
##
## 1 2
## 194 173
Now we have to create one dummy for use in this analysis, as before, we can create two dummies for each category but use only one in the analysis.
csiw
| treat | pretest | posttest | grade | group | group_low | group_high | group_average | group_NA | group_learndis |
| 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 1 | high | 0 | 1 | 0 | 0 | 0 | ||
| 1 | 0 | 6 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 5 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 0 | 7 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 3 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 3 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 10 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 9 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 6 | 9 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 3 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 2 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 11 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 7 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 8 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 10 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 4 | 11 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 3 | 7 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 10 | 1 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 1 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 10 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 1 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 2 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 4 | 9 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 2 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 6 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 3 | 2 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | |||
| 1 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 0 | 5 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 3 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 0 | 4 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 6 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 3 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 0 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 3 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 2 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 3 | 2 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 1 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 11 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 3 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 1 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 2 | low | 1 | 0 | 0 | 0 | 0 | ||
| 1 | 2 | 1 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 2 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 6 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 2 | 0 | 0 | 0 | 1 | 0 | |||
| 1 | 1 | 2 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 2 | 10 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 3 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 11 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 10 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 0 | 11 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 4 | 2 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 2 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 1 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 2 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 2 | 3 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 9 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 1 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 2 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 4 | 12 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 2 | 10 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 5 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 4 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 9 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 4 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 10 | 1 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 4 | 2 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 3 | 1 | average | 0 | 0 | 1 | 0 | 0 | |
| 1 | 3 | 10 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 1 | 7 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 10 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 4 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 4 | 1 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 3 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 7 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 3 | 3 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 2 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 3 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 5 | 10 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 1 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 3 | 4 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 4 | 10 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 4 | 9 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 2 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 10 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 9 | 1 | average | 0 | 0 | 1 | 0 | 0 | |
| 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 0 | 11 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 1 | 1 | 12 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 12 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 7 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 3 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 2 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 5 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 5 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 12 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 0 | 0 | 0 | 1 | 0 | |||
| 1 | 1 | 5 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 7 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 12 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 4 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 8 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 9 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 12 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 9 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 7 | 5 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 6 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 7 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 7 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 6 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 0 | 10 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 9 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 6 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 1 | 2 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 2 | 3 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 10 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 2 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 2 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 9 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 8 | 6 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 12 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 2 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 3 | 3 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 1 | 1 | 10 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 1 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 2 | 3 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 1 | 0 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 2 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 2 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 1 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 5 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 6 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 2 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 1 | 2 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 1 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | 1 | 3 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 3 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 8 | 1 | high | 0 | 1 | 0 | 0 | 0 | |
| 0 | 3 | 3 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 0 | 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 | |
| 0 | 0 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 4 | 4 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 2 | 4 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 6 | 2 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 4 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 2 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 1 | 2 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 | |
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 2 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 0 | 1 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 2 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 0 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 2 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 0 | 3 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 3 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 3 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 0 | 1 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 3 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 0 | 2 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 2 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 2 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 3 | 2 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 2 | high | 0 | 1 | 0 | 0 | 0 | |
| 0 | 0 | 2 | 0 | 0 | 0 | 1 | 0 | ||
| 0 | 0 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 2 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 2 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 0 | 3 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 1 | average | 0 | 0 | 1 | 0 | 0 | ||
| 0 | 1 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 3 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 1 | low | 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | |||
| 0 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 0 | 0 | 1 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 3 | 1 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 2 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | low | 1 | 0 | 0 | 0 | 0 | |
| 0 | 1 | low | 1 | 0 | 0 | 0 | 0 | ||
| 0 | 2 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 3 | 1 | average | 0 | 0 | 1 | 0 | 0 | |
| 0 | 0 | 0 | 1 | high | 0 | 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | ||
| 0 | 1 | 0 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | |
| 0 | 1 | 1 | 1 | average | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 3 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 0 | 2 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | 4 | 0 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | 0 | 1 | 2 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 2 | 0 | 1 | learndis | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 0 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 0 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 0 | 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | |
| 0 | learndis | 0 | 0 | 0 | 0 | 1 | |||
| 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 | ||
| 0 | 0 | learndis | 0 | 0 | 0 | 0 | 1 |
csiw <- dummy_cols(csiw,select_columns = "grade")
colnames(csiw)
## [1] "treat" "pretest" "posttest" "grade"
## [5] "group" "group_low" "group_high" "group_average"
## [9] "group_NA" "group_learndis" "grade_1" "grade_2"
## [13] "grade_NA"
We will use the variable grade_2 as the required dummy in this analysis.
Let us run a simple linear regression to estimate the naive model
naive.model <- lm(posttest~treat, data=csiw)
summ(naive.model)
## MODEL INFO:
## Observations: 325 (82 missing obs. deleted)
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(1,323) = 36.10, p = 0.00
## R² = 0.10
## Adj. R² = 0.10
##
## Standard errors: OLS
## -----------------------------------------------
## Est. S.E. t val. p
## ----------------- ------ ------ -------- ------
## (Intercept) 0.71 0.35 2.06 0.04
## treat 2.44 0.41 6.01 0.00
## -----------------------------------------------
In ANCOVA the idea is that we have a continuous variable as the outcome (posttest) and a discrete variable as the predictor of interest (treat) and other covariates to be controlled for (pretest).
In R, ANCOVA is also implemented in the lm() funcion we use for linear regression.
ancova.model <- lm(posttest ~ treat + pretest, data=csiw)
summ(ancova.model)
## MODEL INFO:
## Observations: 272 (135 missing obs. deleted)
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(2,269) = 24.30, p = 0.00
## R² = 0.15
## Adj. R² = 0.15
##
## Standard errors: OLS
## -----------------------------------------------
## Est. S.E. t val. p
## ----------------- ------ ------ -------- ------
## (Intercept) 0.22 0.40 0.53 0.59
## treat 2.82 0.45 6.28 0.00
## pretest 0.43 0.14 3.03 0.00
## -----------------------------------------------
Now we can also plot the predicted values as a function of pretest and treat (group membership).
# creating a new column in csiw dataset for the predicted values from ANCOVA model
# csiw$ancova.predicted <- predict(ancova.model)
This error happened because while doing linear regression, R drops the rows where column values are NA. So while doing the analysis, R is dropping 135 observations due to missing data.
Let us remove those rows and redo the analysis.
csiw <- csiw[which(complete.cases(csiw)), ]
Now we will do the ANOVA steps again to see if we get this error
ancova.model <- lm(posttest ~ treat + pretest, data=csiw)
summ(ancova.model)
## MODEL INFO:
## Observations: 258
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(2,255) = 23.94, p = 0.00
## R² = 0.16
## Adj. R² = 0.15
##
## Standard errors: OLS
## -----------------------------------------------
## Est. S.E. t val. p
## ----------------- ------ ------ -------- ------
## (Intercept) 0.19 0.42 0.44 0.66
## treat 2.95 0.47 6.34 0.00
## pretest 0.43 0.15 2.90 0.00
## -----------------------------------------------
Dropping the missing values seems to change the treatment effect, this might be an indication that the data is not missing at random. But we will continue with the analysis for now
# creating a new column in csiw dataset for the predicted values from ANCOVA model
csiw$ancova.predicted <- predict(ancova.model)
As you can see now we do not have the error message we had earlier.
Now let us implement the required plot using ggplot2 pakage.
ggplot(csiw, aes(x=pretest, y=ancova.predicted, group=factor(treat),color=factor(treat))) + geom_point()+geom_smooth(method="loess", color="darkred")
The assignment asks to plot the residuals from the ANCOVA model against the pretest covariate. We will first calculate the residuals and store it in a new column in csiw and then do the plotting
# storing residuals as new column in csiw
csiw$ancova.residuals <- residuals(ancova.model)
Now we can plot the required graph. Let us add labels to the x and y axis in this plot
ggplot(csiw, aes(x=pretest, y=ancova.residuals, color=factor(treat))) +geom_point()+geom_smooth(method="loess", color="darkred")+
xlab("Pretest Scores") + ylab("Residuals from ANCOVA model") + labs(title="Residuals vs Pretest - ANCOVA model")
How do we add a quadratic fit? We can create a new variable (pretest.sq) for the pretest variable’s squared value and then replot the graph with the squared variable as the covariate.
csiw$pretest.sq <- csiw$pretest^2
Now let us plot the graph again
ggplot(csiw, aes(x=pretest.sq, y=ancova.residuals, color=factor(treat))) +geom_point()+geom_smooth(method="loess", color="darkred")+
xlab("Pretest Scores Squared") + ylab("Residuals from ANCOVA model") + labs(title="Residuals vs Pretest squared - ANCOVA model")
We can see that linearity seems to be satisfied, because the residuals are close to zero on average and seem to be distributed almost equally above and below the curve. we have heterosketastity
Now we have to center the variable pretest and create another variable for the squared value of the centered pretest variable.
# We will use the mutate function to create new variables
pretest.mean <- mean(csiw$pretest)
pretest.mean
## [1] 0.9186047
csiw$pretest.c <- csiw$pretest-pretest.mean
csiw$pretest.csq <- (csiw$pretest.c)^2
Now let us run the ANCOVA again with the centered variable and its square
quadratic.ancova.model <- lm(posttest ~ treat+pretest.c+pretest.csq, data=csiw)
summ(quadratic.ancova.model)
## MODEL INFO:
## Observations: 258
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(3,254) = 15.98, p = 0.00
## R² = 0.16
## Adj. R² = 0.15
##
## Standard errors: OLS
## -----------------------------------------------
## Est. S.E. t val. p
## ----------------- ------ ------ -------- ------
## (Intercept) 0.53 0.41 1.29 0.20
## treat 2.94 0.47 6.28 0.00
## pretest.c 0.35 0.23 1.49 0.14
## pretest.csq 0.03 0.07 0.46 0.65
## -----------------------------------------------
Let us look at the predicted values and residual plots of this updated model
csiw$quadratic.ancova.predicted <- predict(quadratic.ancova.model)
csiw$quadratic.ancova.residuals <- residuals(quadratic.ancova.model)
Now the plots
par(mfrow=c(1,2))
ggplot(csiw, aes(x=pretest.c, y = quadratic.ancova.predicted, color=factor(treat))) + geom_point()+
geom_smooth(method="loess", color="darkred")+ xlab("Pretest Scores Centered") + ylab("Predicted values from Quadratic ANCOVA model") + labs(title="Predicted values vs Pretest squared - Quadratic ANCOVA model")
ggplot(csiw, aes(x=pretest.c, y = quadratic.ancova.residuals, color=factor(treat))) + geom_point()+
geom_smooth(method="loess", color="darkred")+ xlab("Pretest Scores Centered") + ylab("Residuals from Quadratic ANCOVA model") + labs(title="Residuals vs Pretest squared - Quadratic ANCOVA model")
In order to identify confounders we need three kinds of tests:
The list of confounders to check:
Let us see some examples. Let us see
# Dummies from achievement level eg. achievement level = high
## corresponding dummy in the dataset = group_high
### outcome vs. predictor = posttest vs. group_high = continuous vs. discrete = ANOVA
summary(aov(posttest~group_high, data=csiw))
## Df Sum Sq Mean Sq F value Pr(>F)
## group_high 1 474.4 474.4 43.2 2.75e-10 ***
## Residuals 256 2811.2 11.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(posttest~group_average, data=csiw))
## Df Sum Sq Mean Sq F value Pr(>F)
## group_average 1 2 2.322 0.181 0.671
## Residuals 256 3283 12.825
summary(aov(posttest~group_low, data=csiw))
## Df Sum Sq Mean Sq F value Pr(>F)
## group_low 1 19 19.05 1.493 0.223
## Residuals 256 3267 12.76
summary(aov(posttest~group_learndis, data=csiw))
## Df Sum Sq Mean Sq F value Pr(>F)
## group_learndis 1 416.5 416.5 37.16 3.99e-09 ***
## Residuals 256 2869.2 11.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
### treatment vs. predictor = treat vs group_learndis = Chi Square test
csiw$treat.factor <- factor(csiw$treat)
chisq.test(csiw$treat.factor,csiw$group_high)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: csiw$treat.factor and csiw$group_high
## X-squared = 9.5627, df = 1, p-value = 0.001986
chisq.test(csiw$treat.factor,csiw$group_learndis)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: csiw$treat.factor and csiw$group_learndis
## X-squared = 7.7297, df = 1, p-value = 0.005432
From the results above we can see that this is a potential confounder. Now let us add this to the model and see the output
confounder.check.model <- lm(posttest ~ treat+pretest.c+pretest.csq+group_high + group_learndis, data=csiw)
summ(confounder.check.model)
## MODEL INFO:
## Observations: 258
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(5,252) = 20.67, p = 0.00
## R² = 0.29
## Adj. R² = 0.28
##
## Standard errors: OLS
## ---------------------------------------------------
## Est. S.E. t val. p
## -------------------- ------- ------ -------- ------
## (Intercept) 0.82 0.43 1.92 0.06
## treat 2.17 0.45 4.86 0.00
## pretest.c 0.07 0.22 0.32 0.75
## pretest.csq 0.08 0.06 1.35 0.18
## group_high 2.12 0.46 4.62 0.00
## group_learndis -1.73 0.50 -3.48 0.00
## ---------------------------------------------------
Let us look at the predicted values and residual plots of this updated model
csiw$confounder.check.model <- predict(confounder.check.model)
csiw$confounder.check.residuals <- residuals(confounder.check.model)
Now the plots
par(mfrow=c(1,2))
ggplot(csiw, aes(x=pretest.c, y = confounder.check.model, color=factor(treat))) + geom_point()+
geom_smooth(method="loess", color="darkred")+ xlab("Pretest Scores Centered") + ylab("Predicted values from Quadratic ANCOVA model") + labs(title="Predicted values vs Pretest squared - Confounder Check model")
ggplot(csiw, aes(x=pretest.c, y = confounder.check.residuals, color=factor(treat))) + geom_point()+
geom_smooth(method="loess", color="darkred")+ xlab("Pretest Scores Centered") + ylab("Residuals from Quadratic ANCOVA model") + labs(title="Residuals vs Pretest squared - Confounder Check model")
Now let us look at the results we have gotten so far. For this we will use the export_summs() function from the jtools package
export_summs(naive.model, ancova.model, quadratic.ancova.model, confounder.check.model)
| Model 1 | Model 2 | Model 3 | Model 4 | |
| (Intercept) | 0.71 * | 0.19 | 0.53 | 0.82 |
| (0.35) | (0.42) | (0.41) | (0.43) | |
| treat | 2.44 *** | 2.95 *** | 2.94 *** | 2.17 *** |
| (0.41) | (0.47) | (0.47) | (0.45) | |
| pretest | 0.43 ** | |||
| (0.15) | ||||
| pretest.c | 0.35 | 0.07 | ||
| (0.23) | (0.22) | |||
| pretest.csq | 0.03 | 0.08 | ||
| (0.07) | (0.06) | |||
| group_high | 2.12 *** | |||
| (0.46) | ||||
| group_learndis | -1.73 *** | |||
| (0.50) | ||||
| N | 325 | 258 | 258 | 258 |
| R2 | 0.10 | 0.16 | 0.16 | 0.29 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | ||||
Since the remaining list of covariates to check for confounding are all discrete, we can repeat the steps from above with different covariates.
Exercise: Do the confounder check for grade dummy called grade_2.
grade_2 = 1, if Grade = 5 and grade_2 = 0 if Grade = 4.
interaction.model <- lm(posttest ~ treat*grade_1 +pretest.c+pretest.csq+group_high + group_learndis, data=csiw)
summ(interaction.model)
## MODEL INFO:
## Observations: 258
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(7,250) = 14.75, p = 0.00
## R² = 0.29
## Adj. R² = 0.27
##
## Standard errors: OLS
## ---------------------------------------------------
## Est. S.E. t val. p
## -------------------- ------- ------ -------- ------
## (Intercept) 1.14 0.64 1.79 0.07
## treat 1.80 0.69 2.63 0.01
## grade_1 -0.52 0.77 -0.67 0.50
## pretest.c 0.09 0.23 0.42 0.68
## pretest.csq 0.08 0.06 1.27 0.21
## group_high 2.13 0.46 4.62 0.00
## group_learndis -1.77 0.50 -3.51 0.00
## treat:grade_1 0.63 0.89 0.70 0.48
## ---------------------------------------------------
interaction.model2 <- lm(posttest ~ treat*grade_2 +pretest.c+pretest.csq+group_high + group_learndis, data=csiw)
summ(interaction.model2)
## MODEL INFO:
## Observations: 258
## Dependent Variable: posttest
## Type: OLS linear regression
##
## MODEL FIT:
## F(7,250) = 14.75, p = 0.00
## R² = 0.29
## Adj. R² = 0.27
##
## Standard errors: OLS
## ---------------------------------------------------
## Est. S.E. t val. p
## -------------------- ------- ------ -------- ------
## (Intercept) 0.63 0.52 1.20 0.23
## treat 2.43 0.58 4.16 0.00
## grade_2 0.52 0.77 0.67 0.50
## pretest.c 0.09 0.23 0.42 0.68
## pretest.csq 0.08 0.06 1.27 0.21
## group_high 2.13 0.46 4.62 0.00
## group_learndis -1.77 0.50 -3.51 0.00
## treat:grade_2 -0.63 0.89 -0.70 0.48
## ---------------------------------------------------
Exercise: Do the interaction check for treatment and prior achievement level on 4 groups.
plot(interaction.model2)