Randomized Experiments

Loading Required Libraries, Cleaning the Working Directory and Setting the Directory

library(perm)
library(devtools)
rm(list = ls())
setwd("C:/Users/nirma/Documents/EDX courses/MicroMaster MIT/14.310x-Data Analysis for Social Scientists/Programs")

Calculating Fisher’s Exact p-Value

How many potential assignments across these 8 subjects are possible?

Setting a choose matrix because we have 8 subjects (4-treatment and 4-control) and we are charged with measuring 4 out of 8

perms <- chooseMatrix(8,4)
perms

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]    1    1    1    1    0    0    0    0
 [2,]    1    1    1    0    1    0    0    0
 [3,]    1    1    1    0    0    1    0    0
 [4,]    1    1    1    0    0    0    1    0
 [5,]    1    1    1    0    0    0    0    1
 [6,]    1    1    0    1    1    0    0    0
 [7,]    1    1    0    1    0    1    0    0
 [8,]    1    1    0    1    0    0    1    0
 [9,]    1    1    0    1    0    0    0    1
[10,]    1    1    0    0    1    1    0    0
[11,]    1    1    0    0    1    0    1    0
[12,]    1    1    0    0    1    0    0    1
[13,]    1    1    0    0    0    1    1    0
[14,]    1    1    0    0    0    1    0    1
[15,]    1    1    0    0    0    0    1    1
[16,]    1    0    1    1    1    0    0    0
[17,]    1    0    1    1    0    1    0    0
[18,]    1    0    1    1    0    0    1    0
[19,]    1    0    1    1    0    0    0    1
[20,]    1    0    1    0    1    1    0    0
[21,]    1    0    1    0    1    0    1    0
[22,]    1    0    1    0    1    0    0    1
[23,]    1    0    1    0    0    1    1    0
[24,]    1    0    1    0    0    1    0    1
[25,]    1    0    1    0    0    0    1    1
[26,]    1    0    0    1    1    1    0    0
[27,]    1    0    0    1    1    0    1    0
[28,]    1    0    0    1    1    0    0    1
[29,]    1    0    0    1    0    1    1    0
[30,]    1    0    0    1    0    1    0    1
[31,]    1    0    0    1    0    0    1    1
[32,]    1    0    0    0    1    1    1    0
[33,]    1    0    0    0    1    1    0    1
[34,]    1    0    0    0    1    0    1    1
[35,]    1    0    0    0    0    1    1    1
[36,]    0    1    1    1    1    0    0    0
[37,]    0    1    1    1    0    1    0    0
[38,]    0    1    1    1    0    0    1    0
[39,]    0    1    1    1    0    0    0    1
[40,]    0    1    1    0    1    1    0    0
[41,]    0    1    1    0    1    0    1    0
[42,]    0    1    1    0    1    0    0    1
[43,]    0    1    1    0    0    1    1    0
[44,]    0    1    1    0    0    1    0    1
[45,]    0    1    1    0    0    0    1    1
[46,]    0    1    0    1    1    1    0    0
[47,]    0    1    0    1    1    0    1    0
[48,]    0    1    0    1    1    0    0    1
[49,]    0    1    0    1    0    1    1    0
[50,]    0    1    0    1    0    1    0    1
[51,]    0    1    0    1    0    0    1    1
[52,]    0    1    0    0    1    1    1    0
[53,]    0    1    0    0    1    1    0    1
[54,]    0    1    0    0    1    0    1    1
[55,]    0    1    0    0    0    1    1    1
[56,]    0    0    1    1    1    1    0    0
[57,]    0    0    1    1    1    0    1    0
[58,]    0    0    1    1    1    0    0    1
[59,]    0    0    1    1    0    1    1    0
[60,]    0    0    1    1    0    1    0    1
[61,]    0    0    1    1    0    0    1    1
[62,]    0    0    1    0    1    1    1    0
[63,]    0    0    1    0    1    1    0    1
[64,]    0    0    1    0    1    0    1    1
[65,]    0    0    1    0    0    1    1    1
[66,]    0    0    0    1    1    1    1    0
[67,]    0    0    0    1    1    1    0    1
[68,]    0    0    0    1    1    0    1    1
[69,]    0    0    0    1    0    1    1    1
[70,]    0    0    0    0    1    1    1    1

There will be total of 70 distinct outcomes as shown above. Each row shows a distinct arrangement of treatment and control conditions. The expression 1 refers to the treatment conditions and 0 refers to non-treatment aka. control conditions. It’s just the binary representation of whether somebody was assigned into a treatment or control group.

Remember: These 1 and 0 refers to the treatment and control cases. We can calculate average of treatment and average of control values from this table using the SHARP NULL hypothesis. If so desired, we use the formula:

First lets look at the row 1: it has [1,1,1,1,0,0,0,0] values. Meaning, four treatment values come at first in consecutive order followed by the four control values.

Do we add 0 and 1’s together, then? No, we add the values represented by the ones and zeros. In this case the treatment effect is provided in 1st, 2nd, 3rd, and 4th consecutive order, while the control effect are in 5th, 6th, 7th, and 8th orders, respectively.

So, where are the values, then? They are provided below in the “TandS_Table”. Now,

Calculating Treatment and Control Average for Row 1

A. Treatment_average for row 1 = sum of satisfaction value of 1st, 2nd, 3rd, and 4th scenario / 4*

Treatment_average for row 1 = (0.462 + 0.731 + 0.571 + 0.923)/4

Treatment_average for row 1 = 2.687 /4

Treatment_average for row 1 = 0.67175

B. Control_average for row 1 = sum of satisfaction value of 5th, 6th, 7th, and 8th scenario / 4*

Control_average for row 1 = (0.333 + 0.750 + 0.893 + 0.692)/4

Control_average for row 1 = 2.668 /4

Control_average for row 1 = 0.667

B. test_statistics for row 1

test_statistics for row 1 = Treatment_average for row 1 MINUS Control_average for row 1

test_statistics for row 1 = 0.67175 - 0.667

test_statistics for row 1 = 0.00475

How about the row 65 which has the arrangement in [0 0 1 0 0 1 1 1] order

Here, 3rd, 6th, 7th, and 8th items are treatment items and 1st, 2nd, 4th, and 5th items are control items

Treatment_average for row 65 = (0.571 + 0.750 + 0.893 + 0.692)/4 = 2.906 /4 = 0.7265

Control_average for row 65 = (0.462 + 0.731 + 0.923 + 0.333)/4 = 2.449 /4 = 0.61225

test_statistics for row 1 (difference) = 0.7265 - 0.61225 = 0.11425

To make it more concrete, we now have to take an example. Let’s assume we selected 8 teachers for a random trial of a new software to help them teach the concept of ‘Ratio’ in grade 9, let’s call it Ratio-software for our convenience. We randomly put them either in treatment or a control group and each group had 4 teachers. Once they completed teaching “Ratio”, we measured their effectiveness (How? We don’t need to know.). The calculation showed the following level of satisfaction rates among teachers:

TandS_Table <- data.frame(Treatment = c("0", "1", "0", "0", "0", "1", "1", "1"),
                          Satisfaction = c("0.462", "0.731", "0.571", "0.923", "0.333", "0.750", "0.893", "0.692"))
TandS_Table

  Treatment Satisfaction
1         0        0.462
2         1        0.731
3         0        0.571
4         0        0.923
5         0        0.333
6         1        0.750
7         1        0.893
8         1        0.692

Or, the same table can be generated as a matrix using the below syntax:

Now, lets go further and calculate the Fisher’s exact P-value. To do so, we have to turn the satisfaction rates into a [8 X 1] matrix.

Satisfaction <- matrix(c(0.462, 0.731, 0.571, 0.923, 0.333, 0.750, 0.893, 0.692), nrow=8, ncol=1, byrow=TRUE)
print(Satisfaction)

      [,1]
[1,] 0.462
[2,] 0.731
[3,] 0.571
[4,] 0.923
[5,] 0.333
[6,] 0.750
[7,] 0.893
[8,] 0.692

Next step is to measure the average satisfaction of the teachers in the treatment group and the control group. I don’t want to print all 70 rows in both treatment and control groups, so, I want to print 6 in each group just to make sure my code worked.

# Measures the Average Satisfaction in Treatment Group
treatment_avg <- (1/4)*perms%*%Satisfaction
head(treatment_avg)

        [,1]
[1,] 0.67175
[2,] 0.52425
[3,] 0.62850
[4,] 0.66425
[5,] 0.61400
[6,] 0.61225

# Measures the Average Satisfaction in Control Group
control_avg <- (1/4)*(1-perms)%*%Satisfaction
head(control_avg)

        [,1]
[1,] 0.66700
[2,] 0.81450
[3,] 0.71025
[4,] 0.67450
[5,] 0.72475
[6,] 0.72650

Next, I am going to calculate test statistics by subtracting average control values from average treatment values. The abs function is the short form of the absolute value, which gives us the absolute value of the desired calculations.

test_statistic <- abs(treatment_avg - control_avg)
head(test_statistic)

        [,1]
[1,] 0.00475
[2,] 0.29025
[3,] 0.08175
[4,] 0.01025
[5,] 0.11075
[6,] 0.11425

Once, I have the average difference between treatement and control cases for all 70 possible combinations, I need to return a list of values I had in the perms matrix and apply the function of treatment or control to margins of an array of matrix.

rownumber <- apply(apply(perms, 1, 
                         function(x) (x == c(0, 1, 0, 0, 0, 1, 1, 1))), 
                   2, sum)
rownumber <- (rownumber == 8)

observed_test <- test_statistic[rownumber == TRUE]
observed_test

[1] 0.19425

We can also hand calculate the value:

observed_test = (SUM(Treatment 1:4) - SUM(Control 1:4))/4

= ((0.731+0.750+0.893+0.692) - (0.462+0.571+0.923+0.333))/4

= (3.066 - 2.289)/4

= 0.777/4

= 0.19425

This is the cutoff satisfaction level. Any value of satisfaction among the teachers above this, is supposed to be the positively higher satisfaction.

Now, Lets figure out how many average satisfaction values are over these values.

larger_than_observed <- (test_statistic >= observed_test)
head(larger_than_observed)

      [,1]
[1,] FALSE
[2,]  TRUE
[3,] FALSE
[4,] FALSE
[5,] FALSE
[6,] FALSE

R compared all (70) the average satisfaction statistics with the value we just figured out and gave us whether or not these individual values exceeded our value. “True” refers to the values that are higher than 0.19425 and “False” refers to the values otherwise.

We can hand calculate the total “True” and “False” or we can just say R to sum the values for us.

# numbers in which the statistic exceeds or is equal to the value in the observed date
sum(larger_than_observed)

[1] 16

# numbers in which the statistic exceeds or is equal to the value in the observed date
sum(!larger_than_observed)

[1] 54

There were 16 instances where the satisfaction level among teachers were higher than the cutoff value.

Once we have these values, we are ready to calculate Fisher’s Exact p-Value in this case. Here’s how we do it:

Fisher’s Exact p-Value = total satisfaction level higher than cutoff level / total cases

= 16/70

= 0.228571

Finally let’s create a data frame and put all the values we calculated so far.

combined_table <- data.frame(perms,control_avg,treatment_avg,test_statistic)
head(combined_table)

  X1 X2 X3 X4 X5 X6 X7 X8 control_avg treatment_avg test_statistic
1  1  1  1  1  0  0  0  0     0.66700       0.67175        0.00475
2  1  1  1  0  1  0  0  0     0.81450       0.52425        0.29025
3  1  1  1  0  0  1  0  0     0.71025       0.62850        0.08175
4  1  1  1  0  0  0  1  0     0.67450       0.66425        0.01025
5  1  1  1  0  0  0  0  1     0.72475       0.61400        0.11075
6  1  1  0  1  1  0  0  0     0.72650       0.61225        0.11425

As we saw above, having 8 subjects provide us with 70 different combinations of treatment and control cases. When the sample size keeps getting bigger, hand-calculating treatment and control effect and measuring the impact of treatment gets harder. It is essential to note that Fisher’s Exact P-value is a non-parametric estimate. Meaning, it is useful only when we have small sample size, and it does not make us confident that our sample size is the representative of the population we want to draw on. If we have a large enough sample size we simply don’t worry about this test because other tools provide much accurate and more powerful results than this.

Large Sample Size

However, let’s see how this process looks when we have comparatively larger sample size.

simul_stat <- as.vector(NULL)
my_data <- read.csv('teachers_final.csv')
str(my_data)

'data.frame':   100 obs. of  5 variables:
 $ teacherscore  : num  32 27 38 39 50 19.5 42 38 43 14 ...
 $ treatment     : int  0 0 0 1 0 0 0 0 1 1 ...
 $ teachergrade  : int  9 8 9 9 12 8 9 11 9 8 ...
 $ pctpostwritten: num  0.4 0.65 0.304 0.593 0.573 ...
 $ open          : num  0.538 0.852 0.308 0.889 0.815 ...

There are 5 variables (teacherscore, treatment, teachergrade, pctpostwritten, & open) and 100 observatations in my_data. Of these variables, teacherscore, pctpostwritten, & open take numerical values, while treatment and teachgrade are dummy coded categorical values, thus, they are discrete variables.

I want to make sure that I know how many of the schools were treatment schools and how many of them were control.

sum(my_data$treatment)

[1] 49

There are total of 49 treatment schools and 51 control schools. Now, lets create a function that calculates the treatment and control effect for all possible combinations.

set.seed(1001)
for(i in 1:100) {
  #print(i)
  my_data$rand <- runif(100,min = 0,max = 1)#creates 100 random uniform observations between 0 and 1
  my_data$treatment_rand <- as.numeric(rank(my_data$rand) <= 49)# Randomly selects maximum of 49 place holders in treatment and 51 values in control groups
  my_data$control_rand = 1 - my_data$treatment_rand #creates 51 placeholders for control cases
  simul_stat <- append(simul_stat,sum(my_data$treatment_rand*my_data$open)/sum(my_data$treatment_rand) - sum(my_data$control_rand*my_data$open)/sum(my_data$control_rand))
}

my_data$control = 1 - my_data$treatment #calculates effect on control schools
my_data$control

  [1] 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1
 [38] 1 1 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0
 [75] 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1

# creating actual statistics from my_data
actual_stat <- sum(my_data$treatment*my_data$open)/sum(my_data$treatment) - sum(my_data$control*my_data$open)/sum(my_data$control)
actual_stat

[1] 0.1969086

#Checking if the value from simulation is bigger than the value obtained from the real data
sum(abs(simul_stat) >= actual_stat)/NROW(simul_stat)

[1] 0

The result shows that there is not.

Printing the Average Treatment Effect(ATE)

Don’t get confused that it is just the average of subtraction of control values from treatment values. This value on the other hand is the cutoff value. Any value over this will be used to calculate the Fisher’s Exact P-value.

ate <- actual_stat
ate

[1] 0.1969086

Now, lets calculate the average of average mean of control schools. Please refer to the tables above to understand what it means.

control_mean <- sum(my_data$control*my_data$open)/sum(my_data$control)
control_mean

[1] 0.5928369

And, the average of average treatment effect:

treatment_mean <- sum(my_data$treatment*my_data$open)/sum(my_data$treatment)
treatment_mean

[1] 0.7897455

Treatment mean is approximately 0.79. Which is 0.1969086 higher than control mean.

And, here I am calculating the difference in sample average between control and treatment cases, which gives us the unbiased estimate of the treatment effect. The formula is

tau_hat = 1/Total Treatment X sum of all treatment values MINUS 1/Total Control X sum of all control values

The sum of the control variance is denoted by s_c and the sum of the treatment variance is denoted by s_t and they can be calculated the following way:

# average variance in control cases
s_c <- (1/(sum(my_data$control)-1))*sum(((my_data$open-control_mean)*my_data$control)^2)
s_c

[1] 0.03117632

#average variance in treatment cases
s_t <- (1/(sum(my_data$treatment)-1))*sum(((my_data$open-treatment_mean)*my_data$treatment)^2)
s_t

[1] 0.01578075

That worked. We calculated the variance for both control and treatment schools. Now, I am going to estimate the upper bound of the standard error of the point using the Neyman inference, which is often known as the conservative estimator of sampling standard deviation. It can be calculated using: sq root of v hat

Vneyman <- (s_c/sum(my_data$control) + s_t/sum(my_data$treatment))
print(sqrt(Vneyman))#This value is the conservative estimator of sampling SD

[1] 0.03055088

We can calculate the t-value using the actual_state, i.e., 0.1969086 by the Vneyman value, i.e., 0.03055088.

print(actual_stat/sqrt(Vneyman))

[1] 6.445268

The t-statistics is 6.445268 which is much bigger than the critical value 1.96. Thus, the treatment effect is statistically significantly bigger than a zero.

We now have all the required statistics to be able to calculate the 95% Confidence Interval:

#Calculating 95% Confidence Interval
print(actual_stat-1.96*sqrt(Vneyman))#Lower limit

[1] 0.1370289

print(actual_stat+1.96*sqrt(Vneyman))#Upper limit

[1] 0.2567884

Based on the results, the average mean of the treatment group lies between 0.1270289 and 0.2567884. It does not include zero, thus, we reject the null hypothesis that there is no difference in the mean scores between the students in the treatment and control schools.

Calculating kernel Regression Using Relationship between pctpostwritten and open

‘pctpostwritten’ variable shows students’ average test scores after the treatment, while the variable ‘open’ records the total number of days the schools were open during the study period. My intention is to check if the students’ test scores after treatment had something to do with the days the teaching/learning took place.

The intention of generating 4 plots with same specification but different bandwidth is to check which bandwidth gives most appropriate plot based on the data. We don’t want our plot to be too jagged nor do we want them to be too smooth. The prior is susceptible even for the small variation in the data, while the latter misses the overall quality of data.

library(np)
attach(my_data)
plot0.001 <- npreg(xdat = my_data$open, ydat = my_data$pctpostwritten, bws = 0.001,bandwidth.compute = FALSE)
plot(plot0.001)

This plot definitely have a lot of fluctuation. It is easily influenced by the noise in the data. It is hard to see the clear relationship between the aforementioned variables. I will not vote for this one.

plot0.04 <- npreg(xdat = my_data$open, ydat = my_data$pctpostwritten, bws = 0.04,bandwidth.compute = FALSE)
plot(plot0.04)

This plot with bandwidth 0.04 is much better. It has been responsive to the changes in the data and the relationship is clear. Good plot.

plot1.0 <- npreg(xdat = my_data$open, ydat = my_data$pctpostwritten, bws = 1,bandwidth.compute = FALSE)
plot(plot1.0)

This plot (with bandwidth 1.0) is weird. It doesn’t actually capture the essence of our data. It shows a linear relationship but our distribution as seen in above plots are not as soomth. We don’t want to go for this.

Finally, lets create a plot with bandwidth of 20.0.

plot20.0 <- npreg(xdat = my_data$open, ydat = my_data$pctpostwritten, bws = 20,bandwidth.compute = FALSE)
plot(plot20.0)

This figure is even smoother. A clear linear relationship but misses so many important aspect in our data and the distribution. We don’t want it.

Based on the plots, we might want to go with the plot having bandwidth 0.04.

Now let’s check once again by putting them side by side.

# Showing Plots Side by Side
par(mfrow=c(2,2))
plot(plot0.001, main = "0.001 Bandwith")
plot(plot0.04, main = "0.04 Bandwith")
plot(plot1.0, main = "1.0 Bandwith")
plot(plot20.0, main = "20.0 Bandwith")

——————————–Thank You——————