This notebook presents the data and calculations made to evaluate if there is a statistically significant difference between the performance of students that were exposed to the automated feedback of the RAP system and the performance of students in the control group (just practice, but no automated feedback).
Data
The data used in this analysis is an anonymized version of the data collected during the final measurement. It can be downloaded here.
data <- read.csv("final_measurement.csv")
head(data)
The “Score” column represents the final score assigned by the human expert (professor). The “Group” column indicate if that student was in the Control or Case group.
We can also obtain basic statistics about the students in the different groups:
summary(data$Group)
Case Control
85 95
summary(data[data$Group=="Case",]$Score)
Min. 1st Qu. Median Mean 3rd Qu. Max.
13.00 16.00 18.00 17.15 18.00 20.00
sd(data[data$Group=="Case",]$Score)
[1] 1.867682
summary(data[data$Group=="Control",]$Score)
Min. 1st Qu. Median Mean 3rd Qu. Max.
12.00 15.00 17.00 16.62 18.00 20.00
sd(data[data$Group=="Control",]$Score)
[1] 2.297913
Visualization
We can show that distribution of scores will be presented through a boxplot separating the Case and Control groups
library(hrbrthemes)
library(viridis)
library(ggplot2)
data %>%
ggplot( aes(x=Group, y=Score, fill=Group)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("") +
xlab("")

Difference Test
To test the difference between the means, we use a t-test with the alternative hypothesis that the scores in the case group are greater than the scores in the control group.
t.test(Score ~ Group, data = data, alternative = "greater", paired = FALSE)
Welch Two Sample t-test
data: Score by Group
t = 1.7111, df = 176.42, p-value = 0.04441
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.01790396 Inf
sample estimates:
mean in group Case mean in group Control
17.15294 16.62105
While visually there seems to be a considerable overlap, the test shows that there is a small but significant difference between the two distributions.
Effect Size
To estimate the relative strenght of the intervention we calculate the effect size using the Cohen d. The result is 0.25 that confirms that the effect is small.
suppressMessages(library(effsize))
case=data[data$Group=="Case",]$Score
control=data[data$Group=="Control",]$Score
cohen.d(case,control)
Cohen's d
d estimate: 0.252575 (small)
95 percent confidence interval:
lower upper
-0.04322303 0.54837310
LS0tCnRpdGxlOiAiQ29tcGFyaXNvbiBvZiBwZXJmb3JtYW5jZSBiZXR3ZWVuIENhc2UgYW5kIENvbnRyb2wgR3JvdXBzIC0gUkFQIEV2YWx1YXRpb24iCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KClRoaXMgbm90ZWJvb2sgcHJlc2VudHMgdGhlIGRhdGEgYW5kIGNhbGN1bGF0aW9ucyBtYWRlIHRvIGV2YWx1YXRlIGlmIHRoZXJlIGlzIGEgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCBkaWZmZXJlbmNlIGJldHdlZW4gdGhlIHBlcmZvcm1hbmNlIG9mIHN0dWRlbnRzIHRoYXQgd2VyZSBleHBvc2VkIHRvIHRoZSBhdXRvbWF0ZWQgZmVlZGJhY2sgb2YgdGhlIFJBUCBzeXN0ZW0gYW5kIHRoZSBwZXJmb3JtYW5jZSBvZiBzdHVkZW50cyBpbiB0aGUgY29udHJvbCBncm91cCAoanVzdCBwcmFjdGljZSwgYnV0IG5vIGF1dG9tYXRlZCBmZWVkYmFjaykuCgojRGF0YQpUaGUgZGF0YSB1c2VkIGluIHRoaXMgYW5hbHlzaXMgaXMgYW4gYW5vbnltaXplZCB2ZXJzaW9uIG9mIHRoZSBkYXRhIGNvbGxlY3RlZCBkdXJpbmcgdGhlIGZpbmFsIG1lYXN1cmVtZW50LiBJdCBjYW4gYmUgZG93bmxvYWRlZCA8YSBocmVmPSJodHRwczovL2RyaXZlLmdvb2dsZS5jb20vb3Blbj9pZD0xeU50M0FINzdwclRhUmkyckkxdkJ2c3dTWUs2RlUxeUMiPmhlcmU8L2E+LgoKYGBge3IgZGF0YX0KZGF0YSA8LSByZWFkLmNzdigiZmluYWxfbWVhc3VyZW1lbnQuY3N2IikKaGVhZChkYXRhKQpgYGAKClRoZSAiU2NvcmUiIGNvbHVtbiByZXByZXNlbnRzIHRoZSBmaW5hbCBzY29yZSBhc3NpZ25lZCBieSB0aGUgaHVtYW4gZXhwZXJ0IChwcm9mZXNzb3IpLiAgVGhlICJHcm91cCIgY29sdW1uIGluZGljYXRlIGlmIHRoYXQgc3R1ZGVudCB3YXMgaW4gdGhlIENvbnRyb2wgb3IgQ2FzZSBncm91cC4KCldlIGNhbiBhbHNvIG9idGFpbiBiYXNpYyBzdGF0aXN0aWNzIGFib3V0IHRoZSBzdHVkZW50cyBpbiB0aGUgZGlmZmVyZW50IGdyb3VwczoKYGBge3IgZGF0YV90d299CnN1bW1hcnkoZGF0YSRHcm91cCkKCnN1bW1hcnkoZGF0YVtkYXRhJEdyb3VwPT0iQ2FzZSIsXSRTY29yZSkKc2QoZGF0YVtkYXRhJEdyb3VwPT0iQ2FzZSIsXSRTY29yZSkKCnN1bW1hcnkoZGF0YVtkYXRhJEdyb3VwPT0iQ29udHJvbCIsXSRTY29yZSkKc2QoZGF0YVtkYXRhJEdyb3VwPT0iQ29udHJvbCIsXSRTY29yZSkKYGBgCgojVmlzdWFsaXphdGlvbgoKV2UgY2FuIHNob3cgdGhhdCBkaXN0cmlidXRpb24gb2Ygc2NvcmVzIHdpbGwgYmUgcHJlc2VudGVkIHRocm91Z2ggYSBib3hwbG90IHNlcGFyYXRpbmcgdGhlIENhc2UgYW5kIENvbnRyb2wgZ3JvdXBzCgpgYGB7ciB2aXN1YWxpemF0aW9ufQpsaWJyYXJ5KGhyYnJ0aGVtZXMpCmxpYnJhcnkodmlyaWRpcykKbGlicmFyeShnZ3Bsb3QyKQoKZGF0YSAlPiUKICBnZ3Bsb3QoIGFlcyh4PUdyb3VwLCB5PVNjb3JlLCBmaWxsPUdyb3VwKSkgKwogIGdlb21fYm94cGxvdCgpICsKICBzY2FsZV9maWxsX3ZpcmlkaXMoZGlzY3JldGUgPSBUUlVFLCBhbHBoYT0wLjYpICsKICBnZW9tX2ppdHRlcihjb2xvcj0iYmxhY2siLCBzaXplPTAuMSwgYWxwaGE9MC45KSArCiAgdGhlbWVfaXBzdW0oKSArCiAgdGhlbWUoCiAgICBsZWdlbmQucG9zaXRpb249Im5vbmUiLAogICAgcGxvdC50aXRsZSA9IGVsZW1lbnRfdGV4dChzaXplPTExKQogICkgKwogIGdndGl0bGUoIiIpICsKICB4bGFiKCIiKQpgYGAKCiNEaWZmZXJlbmNlIFRlc3QKClRvIHRlc3QgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgbWVhbnMsIHdlIHVzZSBhIHQtdGVzdCB3aXRoIHRoZSBhbHRlcm5hdGl2ZSBoeXBvdGhlc2lzIHRoYXQgdGhlIHNjb3JlcyBpbiB0aGUgY2FzZSBncm91cCBhcmUgZ3JlYXRlciB0aGFuIHRoZSBzY29yZXMgaW4gdGhlIGNvbnRyb2wgZ3JvdXAuCgpgYGB7ciB0ZXN0fQoKdC50ZXN0KFNjb3JlIH4gR3JvdXAsIGRhdGEgPSBkYXRhLCBhbHRlcm5hdGl2ZSA9ICJncmVhdGVyIiwgcGFpcmVkID0gRkFMU0UpCgpgYGAKCioqV2hpbGUgdmlzdWFsbHkgdGhlcmUgc2VlbXMgdG8gYmUgYSBjb25zaWRlcmFibGUgb3ZlcmxhcCwgdGhlIHRlc3Qgc2hvd3MgdGhhdCB0aGVyZSBpcyBhIHNtYWxsIGJ1dCBzaWduaWZpY2FudCBkaWZmZXJlbmNlIGJldHdlZW4gdGhlIHR3byBkaXN0cmlidXRpb25zLioqCgojRWZmZWN0IFNpemUKClRvIGVzdGltYXRlIHRoZSByZWxhdGl2ZSBzdHJlbmdodCBvZiB0aGUgaW50ZXJ2ZW50aW9uIHdlIGNhbGN1bGF0ZSB0aGUgZWZmZWN0IHNpemUgdXNpbmcgdGhlIENvaGVuIGQuICBUaGUgcmVzdWx0IGlzIDAuMjUgdGhhdCBjb25maXJtcyB0aGF0IHRoZSBlZmZlY3QgaXMgc21hbGwuCgpgYGB7ciB0ZXN0X3R3b30Kc3VwcHJlc3NNZXNzYWdlcyhsaWJyYXJ5KGVmZnNpemUpKQpjYXNlPWRhdGFbZGF0YSRHcm91cD09IkNhc2UiLF0kU2NvcmUKY29udHJvbD1kYXRhW2RhdGEkR3JvdXA9PSJDb250cm9sIixdJFNjb3JlCmNvaGVuLmQoY2FzZSxjb250cm9sKQpgYGA=