Understanding intention-to-treat analysis

Randomised controlled trials (RCTs) are referred to as the “gold standard” for evaluating the impact of an intervention. However, RCTs sometime suffer from non-compliance.

Intention-to-treat is a method of analysis used in RCTs in which all participants are included in the analysis as per the group they were originally assigned to regardless whether they complied with treatment or not. It differs from per-protocol analysis where non-compliers are removed from analysis, which impact the statistical power.

However, there are situations when there is cross-over of participants from control to treatment and this leads to contamination. In such cases, ITT would normally ignore contamination altogether. When such a scenario happen, contamination adjusted intention to treat (CA ITT), can be used as a method of analysis. CA ITT adjusts the treatment effect on an outcome by considering the percentage of participants who receive a treatment, thereby providing a more accurate assessment of treatment efficacy in the presence of contamination.

In a nutshell, CA ITT use the assignment to a treatment as an instrumental variable. The assignment variable is first regressed to the variable that indicates compliance to treatment, then regressed to the outcome using a two stage regression- ivreg in R.

Research study

An organization developed an intervention to tackle nutritional challenges faced by families with pregnant mothers in rural communities of Kenya. The intervention’s goal was to enhance dietary diversity, with particular emphasis on improving the Household Dietary Diversity Score (HDDS) as the primary outcome.

To enable the organisation measure the impact, the intervention was implemented in randomly selected clusters while other clusters were selected as control areas. At the endline, a cluster randomised controlled trial (cRCT) design study was done with a key focus on HDDS outcome. Analysis however show that there was a cross-over effect of participants from control areas to treatment areas. What was the effect of the programme?

library(haven)
library(knitr)
library(kableExtra)
library(ivreg)
library(haven)
library(DT)
library(sjPlot)
library(expss)
mydata <- read_dta("C:/Users/Dell/OneDrive - Triggerise/Statistical/2sls/mydata.dta")
attach(mydata)
head(mydata,10) %>%
  kable("html") %>%
  kable_styling(font_size=12) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
participant assignment treated HDDS
1 0 0 2
2 1 1 3
3 0 0 3
4 1 1 2
5 1 1 3
6 1 1 4
7 1 1 3
8 1 1 2
9 0 0 3
10 1 1 3

Proportion contaminated

The table below shows number of participants who either complied or did not comply with the treatment. Based on the overall sample, 213 participants from control areas, representing 21.5% of the total respondents assigned to control areas had received the intervention.

sjPlot::tab_xtab(var.row = mydata$assignment, var.col = mydata$treated, title = "Compliance to treatment", show.row.prc = TRUE)
Compliance to treatment
assignment treated Total
No Yes
Control 779
78.5 %
213
21.5 %
992
100 %
Treatment 0
0 %
1176
100 %
1176
100 %
Total 779
35.9 %
1389
64.1 %
2168
100 %
χ2=1438.009 · df=1 · φ=0.815 · p=0.000

Descriptive analysis

Analysis shows that at the endline, the average household dietary diversity score (HDDS) was 5.4. Comparison by treatment vs control group, HDDS for the treatment group was higher than the control group (5.9 vs 4.8).

mydata %>% 
    tab_cells(HDDS) %>%
    tab_cols(total(label = "#Total| |"), assignment) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
    tab_pivot()
 #Total     assignment 
       Control     Treatment 
 Mean   Std. dev.   Valid N     Mean   Std. dev.   Valid N     Mean   Std. dev.   Valid N 
 Dietary Diversity  5.4 2.5 2168   4.8 2.3 992   5.9 2.5 1176

CA ITT model (impact)

After adjusting for the contamination effect, the analysis reveals that the intervention led to a statistically significant increase in the Household Dietary Diversity Score (HDDS) by 1.51 units (p-value < 0.0001).

iv_model <- ivreg(HDDS ~ treated| assignment, data = mydata)
summary(iv_model)
## 
## Call:
## ivreg(formula = HDDS ~ treated | assignment, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4.94388 -1.94388  0.05612  1.56874  6.56874 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.43126    0.09757   45.41   <2e-16 ***
## treated      1.51261    0.12998   11.64   <2e-16 ***
## 
## Diagnostic tests:
##                   df1  df2 statistic p-value    
## Weak instruments    1 2166  4296.990  <2e-16 ***
## Wu-Hausman          1 2165     0.026   0.872    
## Sargan              0   NA        NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.368 on 2166 degrees of freedom
## Multiple R-Squared: 0.0847,  Adjusted R-squared: 0.08427 
## Wald test: 135.4 on 1 and 2166 DF,  p-value: < 2.2e-16