Assumes libraries tidyverse, descriptr, gridExtra
df = read.csv("S2_Pre_Post_full.csv", header = T)
df$ROLE <- factor(df$ROLE, levels = c(1,9), labels = c("tutor_first", "tutee_first"))
The maximum score in the test is
ds_summary_stats(df,PRE.SCORE)
ggplot(df, aes(PRE.SCORE)) + geom_bar()
ggplot(df, aes(PRE.SCORE)) +
geom_histogram(bins = 10)
ggplot(df, aes(x = 1, y = PRE.SCORE)) +
geom_boxplot() +
scale_x_continuous(breaks = NULL) +
theme(axis.title.x = element_blank())
To interpret changes due to ROLE later, is there a differnce in the pre-test between students that subseqently were in the tutor_first or tutee_first role?
ggplot(df, aes(x = ROLE, y = PRE.SCORE)) +
geom_boxplot() +
xlab("Tutor role")
While the tutor_first is slightly better, this is likely random. A t-test agrees, with p greater than 0.05.
t.test(df$PRE.SCORE ~df$ROLE)
Welch Two Sample t-test
data: df$PRE.SCORE by df$ROLE
t = 0.47185, df = 43.496, p-value = 0.6394
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.130335 5.043378
sample estimates:
mean in group tutor_first mean in group tutee_first
13.52174 12.56522
The non-parametric Wilcoxon test further confirms that there is no significant difference between the two groups:
wilcox.test(df$PRE.SCORE ~df$ROLE)
cannot compute exact p-value with ties
Wilcoxon rank sum test with continuity correction
data: df$PRE.SCORE by df$ROLE
W = 295, p-value = 0.5092
alternative hypothesis: true location shift is not equal to 0
ds_summary_stats(df,POST.SCORE)
────────────────────────────────────────────────── Variable: POST.SCORE ──────────────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 49.79
Missing 0.00 Std Deviation 7.06
Mean 17.89 Range 23.00
Median 19.50 Interquartile Range 11.50
Mode 17.00 Uncorrected SS 16965.00
Trimmed Mean 18.05 Corrected SS 2240.46
Skewness -0.52 Coeff Variation 39.44
Kurtosis -0.98 Std Error Mean 1.04
Quantiles
Quantile Value
Max 28.00
99% 27.55
95% 26.75
90% 26.00
Q3 23.75
Median 19.50
Q1 12.25
10% 7.00
5% 5.25
1% 5.00
Min 5.00
Extreme Values
Low High
Obs Value Obs Value
6 5 35 28
13 5 37 27
19 5 39 27
5 6 3 26
32 7 9 26
ggplot(df, aes(POST.SCORE)) + geom_bar()
ggplot(df, aes(POST.SCORE)) +
geom_histogram(bins = 10)
Check:we’d like to know if the 4 people with the low values are the same from pre to post test. That would indicate non-engagment.
ggplot(df, aes(x = 1, y = POST.SCORE)) +
geom_boxplot() +
scale_x_continuous(breaks = NULL) +
theme(axis.title.x = element_blank())
By role:
ggplot(df, aes(x = ROLE, y = POST.SCORE)) +
geom_boxplot() +
xlab("Tutor role")
The difference between the two conditions is marginal, by inspection, also keeping in mind that the pre-test scores where sliglyt elavated for the tutor_first condition. A test reveals no significant difference.
t.test(df$POST.SCORE ~df$ROLE)
Welch Two Sample t-test
data: df$POST.SCORE by df$ROLE
t = 0.35175, df = 43.977, p-value = 0.7267
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.495776 4.974037
sample estimates:
mean in group tutor_first mean in group tutee_first
18.26087 17.52174
In the further analysis we treat the two groups as comparable.
With the pre-test mean at 13.04 and the post-test score at 17.89 the intervention was clearly effective:
t.test(df$POST.SCORE, df$PRE.SCORE, paired=T)
Paired t-test
data: df$POST.SCORE and df$PRE.SCORE
t = 5.7218, df = 45, p-value = 8.067e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.141365 6.554287
sample estimates:
mean of the differences
4.847826
Looking at the individual gain scores, we see that hile most of the gain scores are positive, with negative values being few and small values only, the loss score of 11 points for student in position 8 (S2-AM-9) is rather extreme.
gain <- df$POST.SCORE - df$PRE.SCORE
gain
[1] 7 4 4 0 1 -1 6 -11 8 6 14 8 -3 11 3 -3 9 -3 -2 2 3 2 15 3 5 12 4 5 15
[30] 13 5 3 12 7 1 1 -1 5 10 15 0 1 10 8 11 -2
This can be explained by this student not actually engaging with the post-test….
Given that the students had no very little initial knowledge, losses in the post test need to be explained on a case by case basis. We do this in three sections:
No gain or loss likely means non-engagement. This pattern holds for these cases:
S2-AM-9:
Others?
df$P1Q1
[1] 1 2 5 6 1 2 2 2 4 2 2 5 2 3 6 6 5 4 3 2 2 4 0 6 2 2 6 5 2 1 3 1 3 3 5 0 6 2 5 2 1 4 4 2 1 3
df$POSTP1Q1
[1] 2 2 6 6 0 1 2 0 6 3 2 5 0 5 6 6 6 5 3 2 2 6 4 6 2 6 5 2 6 6 4 3 6 4 6 2 5 2 5 6 0 6 6 5 4 1
ds_summary_stats(df,P1Q1, POSTP1Q1 )
──────────────────────────────────────────── Variable: P1Q1 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 3.11
Missing 0.00 Std Deviation 1.76
Mean 3.04 Range 6.00
Median 2.50 Interquartile Range 2.75
Mode 2.00 Uncorrected SS 566.00
Trimmed Mean 3.05 Corrected SS 139.91
Skewness 0.34 Coeff Variation 57.94
Kurtosis -0.99 Std Error Mean 0.26
Quantiles
Quantile Value
Max 6.00
99% 6.00
95% 6.00
90% 6.00
Q3 4.75
Median 2.50
Q1 2.00
10% 1.00
5% 1.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
23 0 4 6
36 0 15 6
1 1 16 6
5 1 24 6
30 1 27 6
────────────────────────────────────────── Variable: POSTP1Q1 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 4.34
Missing 0.00 Std Deviation 2.08
Mean 3.87 Range 6.00
Median 4.50 Interquartile Range 4.00
Mode 6.00 Uncorrected SS 884.00
Trimmed Mean 3.95 Corrected SS 195.22
Skewness -0.47 Coeff Variation 53.83
Kurtosis -1.19 Std Error Mean 0.31
Quantiles
Quantile Value
Max 6.00
99% 6.00
95% 6.00
90% 6.00
Q3 6.00
Median 4.50
Q1 2.00
10% 1.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
5 0 3 6
8 0 4 6
13 0 9 6
41 0 15 6
6 1 16 6
boxplot(df$P1Q1, data = df)
boxplot(df$POSTP1Q1, data = df)
ggplot(df, aes(P1Q1)) + geom_bar()
ggplot(df, aes(POSTP1Q1)) + geom_bar()
df$P1Q2
[1] 0 4 4 4 0 0 4 4 0 4 0 4 1 4 4 4 0 4 0 3 0 0 0 4 0 0 4 4 0 4 1 0 0 1 4 1 4 0 4 0 0 0 2 3 4 0
df$POSTP1Q2
[1] 4 4 4 4 0 0 4 4 4 3 4 4 0 4 4 4 4 4 0 3 0 4 3 4 1 4 4 4 4 4 1 0 4 4 4 1 4 2 4 4 0 0 4 4 4 1
ds_summary_stats(df,P1Q2, POSTP1Q2)
──────────────────────────────────────────── Variable: P1Q2 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 3.59
Missing 0.00 Std Deviation 1.90
Mean 1.91 Range 4.00
Median 1.00 Interquartile Range 4.00
Mode 0.00 Uncorrected SS 330.00
Trimmed Mean 1.90 Corrected SS 161.65
Skewness 0.11 Coeff Variation 99.07
Kurtosis -1.96 Std Error Mean 0.28
Quantiles
Quantile Value
Max 4.00
99% 4.00
95% 4.00
90% 4.00
Q3 4.00
Median 1.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 2 4
5 0 3 4
6 0 4 4
9 0 7 4
11 0 8 4
────────────────────────────────────────── Variable: POSTP1Q2 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 2.64
Missing 0.00 Std Deviation 1.62
Mean 2.93 Range 4.00
Median 4.00 Interquartile Range 2.75
Mode 4.00 Uncorrected SS 515.00
Trimmed Mean 3.02 Corrected SS 118.80
Skewness -1.06 Coeff Variation 55.36
Kurtosis -0.70 Std Error Mean 0.24
Quantiles
Quantile Value
Max 4.00
99% 4.00
95% 4.00
90% 4.00
Q3 4.00
Median 4.00
Q1 1.25
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
5 0 1 4
6 0 2 4
13 0 3 4
19 0 4 4
21 0 7 4
ggplot(df, aes(P1Q2)) + geom_bar()
ggplot(df, aes(POSTP1Q2)) + geom_bar()
The prestest has a very odd distribution: Check!
df$P1Q3
[1] 2 2 2 2 2 2 3 3 3 2 2 1 2 2 2 2 2 1 2 2 2 3 1 2 1 3 2 1 2 1 0 1 2 2 3 2 3 0 2 2 1 2 2 2 2 2
df$POSTP1Q3
[1] 2 2 2 2 1 2 3 3 3 2 2 1 2 2 2 2 2 1 2 2 2 3 2 3 1 2 2 2 2 2 2 2 2 2 3 2 3 1 3 2 2 2 2 2 3 2
ds_summary_stats(df,P1Q3, POSTP1Q3 )
──────────────────────────────────────────── Variable: P1Q3 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.50
Missing 0.00 Std Deviation 0.71
Mean 1.89 Range 3.00
Median 2.00 Interquartile Range 0.00
Mode 2.00 Uncorrected SS 187.00
Trimmed Mean 1.93 Corrected SS 22.46
Skewness -0.63 Coeff Variation 37.35
Kurtosis 0.99 Std Error Mean 0.10
Quantiles
Quantile Value
Max 3.00
99% 3.00
95% 3.00
90% 3.00
Q3 2.00
Median 2.00
Q1 2.00
10% 1.00
5% 1.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
31 0 7 3
38 0 8 3
12 1 9 3
18 1 22 3
23 1 26 3
────────────────────────────────────────── Variable: POSTP1Q3 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.30
Missing 0.00 Std Deviation 0.55
Mean 2.09 Range 2.00
Median 2.00 Interquartile Range 0.00
Mode 2.00 Uncorrected SS 214.00
Trimmed Mean 2.10 Corrected SS 13.65
Skewness 0.06 Coeff Variation 26.39
Kurtosis 0.44 Std Error Mean 0.08
Quantiles
Quantile Value
Max 3.00
99% 3.00
95% 3.00
90% 3.00
Q3 2.00
Median 2.00
Q1 2.00
10% 1.50
5% 1.00
1% 1.00
Min 1.00
Extreme Values
Low High
Obs Value Obs Value
5 1 7 3
12 1 8 3
18 1 9 3
25 1 22 3
38 1 24 3
boxplot(df$P1Q3, data = df)
boxplot(df$POSTP1Q3, data = df)
ggplot(df, aes(P1Q3)) + geom_bar()
ggplot(df, aes(POSTP1Q3)) + geom_bar()
df$P1Q4
[1] 1 2 2 2 0 0 2 2 2 2 0 2 1 2 2 2 1 2 0 0 0 0 0 2 0 0 1 0 2 2 1 0 2 2 2 2 2 0 2 2 2 0 2 0 1 1
df$POSTP1Q4
[1] 2 2 2 2 2 0 2 2 2 1 2 2 2 2 2 2 2 2 0 2 0 2 1 2 0 2 2 1 2 2 0 0 2 2 2 1 2 0 2 2 2 0 2 0 2 1
ds_summary_stats(df,P1Q4, POSTP1Q4 )
──────────────────────────────────────────── Variable: P1Q4 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.83
Missing 0.00 Std Deviation 0.91
Mean 1.20 Range 2.00
Median 2.00 Interquartile Range 2.00
Mode 2.00 Uncorrected SS 103.00
Trimmed Mean 1.21 Corrected SS 37.24
Skewness -0.41 Coeff Variation 76.08
Kurtosis -1.70 Std Error Mean 0.13
Quantiles
Quantile Value
Max 2.00
99% 2.00
95% 2.00
90% 2.00
Q3 2.00
Median 2.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
5 0 2 2
6 0 3 2
11 0 4 2
19 0 7 2
20 0 8 2
────────────────────────────────────────── Variable: POSTP1Q4 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.66
Missing 0.00 Std Deviation 0.81
Mean 1.50 Range 2.00
Median 2.00 Interquartile Range 1.00
Mode 2.00 Uncorrected SS 133.00
Trimmed Mean 1.55 Corrected SS 29.50
Skewness -1.18 Coeff Variation 53.98
Kurtosis -0.38 Std Error Mean 0.12
Quantiles
Quantile Value
Max 2.00
99% 2.00
95% 2.00
90% 2.00
Q3 2.00
Median 2.00
Q1 1.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
6 0 1 2
19 0 2 2
21 0 3 2
25 0 4 2
31 0 5 2
boxplot(df$P1Q4, data = df)
boxplot(df$POSTP1Q4, data = df)
ggplot(df, aes(P1Q4)) + geom_bar()
ggplot(df, aes(POSTP1Q4)) + geom_bar()
df$P2Q1
[1] 0 0 2 2 0 0 0 4 4 0 1 1 1 0 2 2 0 3 0 3 0 4 0 0 0 0 0 1 0 0 2 0 1 1 4 2 4 1 0 0 0 1
[43] 0 0 0 1
df$POSTP2Q1
[1] 2 2 4 2 0 0 2 1 2 3 4 4 1 3 4 2 2 1 0 2 1 4 3 2 2 1 3 0 4 4 2 0 4 1 4 2 4 1 4 3 0 1
[43] 3 0 2 0
ds_summary_stats(df,P2Q1, POSTP2Q1 )
──────────────────────────────────── Variable: P2Q1 ────────────────────────────────────
Univariate Analysis
N 46.00 Variance 1.84
Missing 0.00 Std Deviation 1.36
Mean 1.02 Range 4.00
Median 0.00 Interquartile Range 2.00
Mode 0.00 Uncorrected SS 131.00
Trimmed Mean 0.93 Corrected SS 82.98
Skewness 1.18 Coeff Variation 132.90
Kurtosis 0.19 Std Error Mean 0.20
Quantiles
Quantile Value
Max 4.00
99% 4.00
95% 4.00
90% 3.50
Q3 2.00
Median 0.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 8 4
2 0 9 4
5 0 22 4
6 0 35 4
7 0 37 4
────────────────────────────────── Variable: POSTP2Q1 ──────────────────────────────────
Univariate Analysis
N 46.00 Variance 1.99
Missing 0.00 Std Deviation 1.41
Mean 2.09 Range 4.00
Median 2.00 Interquartile Range 2.00
Mode 2.00 Uncorrected SS 290.00
Trimmed Mean 2.10 Corrected SS 89.65
Skewness -0.01 Coeff Variation 67.63
Kurtosis -1.20 Std Error Mean 0.21
Quantiles
Quantile Value
Max 4.00
99% 4.00
95% 4.00
90% 4.00
Q3 3.00
Median 2.00
Q1 1.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
5 0 3 4
6 0 11 4
19 0 12 4
28 0 15 4
32 0 22 4
boxplot(df$P2Q1, data = df)
boxplot(df$POSTP2Q1, data = df)
ggplot(df, aes(P2Q1)) + geom_bar()
ggplot(df, aes(POSTP2Q1)) + geom_bar()
df$P2Q2
[1] 0 2 4 4 0 0 2 4 1 0 0 0 0 0 2 4 0 3 0 1 0 1 0 4 0 0 3 0 0 0 1 0 4 0 4 2 4 0 0 1 0 0 1 0 0 0
df$POSTP2Q2
[1] 0 4 4 4 0 0 4 3 4 4 4 4 0 4 4 2 0 1 0 2 0 0 2 4 0 0 3 4 2 4 1 0 4 1 4 1 4 0 4 3 0 0 4 3 1 1
ds_summary_stats(df,P2Q2, POSTP2Q2 )
──────────────────────────────────────────── Variable: P2Q2 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 2.43
Missing 0.00 Std Deviation 1.56
Mean 1.13 Range 4.00
Median 0.00 Interquartile Range 2.00
Mode 0.00 Uncorrected SS 168.00
Trimmed Mean 1.05 Corrected SS 109.22
Skewness 1.03 Coeff Variation 137.81
Kurtosis -0.57 Std Error Mean 0.23
Quantiles
Quantile Value
Max 4.00
99% 4.00
95% 4.00
90% 4.00
Q3 2.00
Median 0.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 3 4
5 0 4 4
6 0 8 4
10 0 16 4
11 0 24 4
────────────────────────────────────────── Variable: POSTP2Q2 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 3.05
Missing 0.00 Std Deviation 1.75
Mean 2.13 Range 4.00
Median 2.00 Interquartile Range 4.00
Mode 4.00 Uncorrected SS 346.00
Trimmed Mean 2.14 Corrected SS 137.22
Skewness -0.10 Coeff Variation 81.97
Kurtosis -1.79 Std Error Mean 0.26
Quantiles
Quantile Value
Max 4.00
99% 4.00
95% 4.00
90% 4.00
Q3 4.00
Median 2.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 2 4
5 0 3 4
6 0 4 4
13 0 7 4
17 0 9 4
boxplot(df$P2Q2, data = df)
boxplot(df$POSTP2Q2, data = df)
ggplot(df, aes(P2Q2)) + geom_bar()
ggplot(df, aes(POSTP2Q2)) + geom_bar()
df$P2Q3
[1] 2 3 2 2 2 2 3 3 3 2 2 1 1 0 3 2 2 1 2 2 1 3 1 3 1 0 2 1 1 1 2 2 1 2 3 2 3 0 3 1 2 2 3 2 0 3
df$POSTP2Q3
[1] 2 3 2 2 1 2 3 0 3 2 2 1 0 1 2 2 2 1 0 2 1 0 2 3 1 2 2 2 1 2 3 2 2 3 3 2 3 1 3 2 2 2 2 2 2 2
ds_summary_stats(df,P2Q3, POSTP2Q3 )
──────────────────────────────────────────── Variable: P2Q3 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.84
Missing 0.00 Std Deviation 0.92
Mean 1.85 Range 3.00
Median 2.00 Interquartile Range 1.75
Mode 2.00 Uncorrected SS 195.00
Trimmed Mean 1.88 Corrected SS 37.93
Skewness -0.41 Coeff Variation 49.69
Kurtosis -0.59 Std Error Mean 0.14
Quantiles
Quantile Value
Max 3.00
99% 3.00
95% 3.00
90% 3.00
Q3 2.75
Median 2.00
Q1 1.00
10% 1.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
14 0 2 3
26 0 7 3
38 0 8 3
45 0 9 3
12 1 15 3
────────────────────────────────────────── Variable: POSTP2Q3 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.71
Missing 0.00 Std Deviation 0.84
Mean 1.85 Range 3.00
Median 2.00 Interquartile Range 0.75
Mode 2.00 Uncorrected SS 189.00
Trimmed Mean 1.88 Corrected SS 31.93
Skewness -0.63 Coeff Variation 45.59
Kurtosis 0.15 Std Error Mean 0.12
Quantiles
Quantile Value
Max 3.00
99% 3.00
95% 3.00
90% 3.00
Q3 2.00
Median 2.00
Q1 1.25
10% 1.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
8 0 2 3
13 0 7 3
19 0 9 3
22 0 24 3
5 1 31 3
boxplot(df$P2Q3, data = df)
boxplot(df$POSTP2Q3, data = df)
ggplot(df, aes(P2Q3)) + geom_bar()
ggplot(df, aes(POSTP2Q3)) + geom_bar()
df$P2Q4
[1] 2 2 1 2 0 0 0 2 1 2 1 1 0 0 2 2 1 2 0 2 0 2 2 2 0 0 1 0 0 2 0 0 0 1 2 1 2 0 1 1 1 1 2 1 1 2
df$POSTP2Q4
[1] 1 2 2 2 2 0 2 0 2 2 2 2 0 1 2 1 2 2 0 2 2 0 2 2 2 0 2 2 1 0 2 0 1 2 2 1 2 1 2 2 1 0 2 2 2 2
ds_summary_stats(df,P2Q4, POSTP2Q4 )
──────────────────────────────────────────── Variable: P2Q4 ───────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.71
Missing 0.00 Std Deviation 0.84
Mean 1.04 Range 2.00
Median 1.00 Interquartile Range 2.00
Mode 2.00 Uncorrected SS 82.00
Trimmed Mean 1.05 Corrected SS 31.91
Skewness -0.08 Coeff Variation 80.70
Kurtosis -1.59 Std Error Mean 0.12
Quantiles
Quantile Value
Max 2.00
99% 2.00
95% 2.00
90% 2.00
Q3 2.00
Median 1.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
5 0 1 2
6 0 2 2
7 0 4 2
13 0 8 2
14 0 10 2
────────────────────────────────────────── Variable: POSTP2Q4 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 0.65
Missing 0.00 Std Deviation 0.81
Mean 1.43 Range 2.00
Median 2.00 Interquartile Range 1.00
Mode 2.00 Uncorrected SS 124.00
Trimmed Mean 1.48 Corrected SS 29.30
Skewness -0.97 Coeff Variation 56.24
Kurtosis -0.73 Std Error Mean 0.12
Quantiles
Quantile Value
Max 2.00
99% 2.00
95% 2.00
90% 2.00
Q3 2.00
Median 2.00
Q1 1.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
6 0 2 2
8 0 3 2
13 0 4 2
19 0 5 2
22 0 7 2
boxplot(df$P2Q4, data = df)
boxplot(df$POSTP2Q4, data = df)
ggplot(df, aes(P2Q4)) + geom_bar()
ggplot(df, aes(POSTP2Q4)) + geom_bar()
df$TRANSFER1
[1] 5 10 10 8 0 8 9 0 8 5 8 8 4 5 9 7 6 3 0 7 0 0 5 9 5 0 5 6 8 10 9 0 9
[34] 7 10 5 8 3 10 9 5 3 9 5 0 0
ds_summary_stats(df,TRANSFER1 )
───────────────────────────────────────── Variable: TRANSFER1 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 11.83
Missing 0.00 Std Deviation 3.44
Mean 5.65 Range 10.00
Median 6.00 Interquartile Range 5.50
Mode 0.00 Uncorrected SS 2002.00
Trimmed Mean 5.71 Corrected SS 532.43
Skewness -0.52 Coeff Variation 60.86
Kurtosis -0.99 Std Error Mean 0.51
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 9.50
Q3 8.75
Median 6.00
Q1 3.25
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
5 0 2 10
8 0 3 10
19 0 30 10
21 0 35 10
22 0 39 10
boxplot(df$TRANSFER1, data = df)
ggplot(df, aes(TRANSFER1)) + geom_histogram(bins = 6)
ggplot(df, aes(x = ROLE, y = TRANSFER1)) +
geom_boxplot() +
xlab("Tutor role")
Visually yes but just missing statistical significance:
t.test(df$TRANSFER1 ~df$ROLE)
Welch Two Sample t-test
data: df$TRANSFER1 by df$ROLE
t = 1.9621, df = 43.595, p-value = 0.05616
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.05265162 3.89178206
sample estimates:
mean in group tutor_first mean in group tutee_first
6.557826 4.638261
Might be worth doing this with non-engaged students removed.
df$TRANSFER2
[1] 0.0 9.0 0.0 9.0 0.0 6.0 10.0 0.0 8.0 9.0 10.0 10.0 0.0 0.0 10.0 8.0 8.0 3.0 0.0
[20] 9.0 0.0 0.0 9.0 10.0 4.0 3.0 8.0 9.0 10.0 10.0 9.0 0.0 9.5 8.5 10.0 8.0 8.5 0.0
[39] 10.0 9.0 0.0 2.5 9.5 6.0 0.0 0.0
ds_summary_stats(df,TRANSFER2 )
───────────────────────────────────────── Variable: TRANSFER2 ─────────────────────────────────────────
Univariate Analysis
N 46.00 Variance 17.94
Missing 0.00 Std Deviation 4.24
Mean 5.71 Range 10.00
Median 8.00 Interquartile Range 9.00
Mode 0.00 Uncorrected SS 2305.25
Trimmed Mean 5.77 Corrected SS 807.29
Skewness -0.45 Coeff Variation 74.22
Kurtosis -1.65 Std Error Mean 0.62
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 9.00
Median 8.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 7 10
3 0 11 10
5 0 12 10
8 0 15 10
13 0 24 10
boxplot(df$TRANSFER2, data = df)
ggplot(df, aes(TRANSFER2)) + geom_histogram(bins = 6)
ggplot(df, aes(x = ROLE, y = TRANSFER2)) +
geom_boxplot() +
xlab("Tutor role")
Visually yes but missing statistical significance:
t.test(df$TRANSFER2 ~df$ROLE)
Welch Two Sample t-test
data: df$TRANSFER2 by df$ROLE
t = 1.537, df = 43.991, p-value = 0.1315
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.5886228 4.3712315
sample estimates:
mean in group tutor_first mean in group tutee_first
6.652174 4.760870
Might be worth doing this with non-engaged students removed.
Filtering out six likely cases of non-engagement:
df <- filter(df, Stdcode != "S2_AM_9", Stdcode != "S2_AM_6", Stdcode != "S2_AM_06",
Stdcode != "S2_AM_14", Stdcode != "S2_PM_010", Stdcode != "S2_PM_05")
Treatment effect general :
t.test(df$POST.SCORE, df$PRE.SCORE, paired=T)
Paired t-test
data: df$POST.SCORE and df$PRE.SCORE
t = 7.6966, df = 39, p-value = 2.416e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.460051 7.639949
sample estimates:
mean of the differences
6.05
Strong as ever.
Significant differences regarding role before intervention?
t.test(df$PRE.SCORE ~df$ROLE)
Welch Two Sample t-test
data: df$PRE.SCORE by df$ROLE
t = 0.2289, df = 36.62, p-value = 0.8202
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.927433 4.927433
sample estimates:
mean in group tutor_first mean in group tutee_first
13.65 13.15
No.Â
After intervention?
t.test(df$POST.SCORE ~df$ROLE)
Welch Two Sample t-test
data: df$POST.SCORE by df$ROLE
t = 0.41189, df = 37.999, p-value = 0.6827
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.131901 4.731901
sample estimates:
mean in group tutor_first mean in group tutee_first
19.85 19.05
ALso not.
How about Transfer item Scatterplat?
t.test(df$TRANSFER1 ~df$ROLE)
Welch Two Sample t-test
data: df$TRANSFER1 by df$ROLE
t = 1.9116, df = 37.233, p-value = 0.06364
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1143944 3.9463944
sample estimates:
mean in group tutor_first mean in group tutee_first
6.979 5.063
Not quite but close.
And BWD Transfer?
t.test(df$TRANSFER2 ~df$ROLE)
Welch Two Sample t-test
data: df$TRANSFER2 by df$ROLE
t = 1.5057, df = 37.894, p-value = 0.1404
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.6461973 4.3961973
sample estimates:
mean in group tutor_first mean in group tutee_first
7.350 5.475
More clearly not.
So, removing the non-engaged students doesn’t help with the significance testing. Suggestion is to not do analysis with these removed because it raises issues with selection criteria. Keep discussion to qualitative to have explanation for no learning/loss cases.
Reset df:
df = read.csv("S2_Pre_Post_full.csv", header = T)
There were 44 warnings (use warnings() to see them)
df$ROLE <- factor(df$ROLE, levels = c(1,9), labels = c("tutor_first", "tutee_first"))
postdf <- df %>% select(starts_with("POST"))
Look at the correlations of the post-test items. rcorr() is from the package Hmisc.
Any P LEQ0.05 can be considered significant.
We can think of a students’ scores in the post-test as a kind of profile, and ask if there are clusters of students with similar profiles. This is what a cluster analysis lets us find out.
We may have to think about what the post-test values mean and if a standardisation is required. We might need to standardise if the maximal scores are different between items.
Using Euclidian distance, we compute the distance between the students:
dist.eucl <- dist(postdf, method = "euclidean")
The first 10 students’ distances are:
round(as.matrix(dist.eucl)[1:10, 1:10], 1)
1 2 3 4 5 6 7 8 9 10
1 0.0 7.3 12.6 10.7 10.4 11.2 8.2 4.9 12.5 6.8
2 7.3 0.0 6.8 5.1 16.4 17.4 1.4 9.2 6.5 2.4
3 12.6 6.8 0.0 2.8 22.0 22.8 6.2 15.0 2.4 6.9
4 10.7 5.1 2.8 0.0 19.9 20.7 4.7 13.0 2.4 5.3
5 10.4 16.4 22.0 19.9 0.0 3.5 17.4 9.2 21.9 15.6
6 11.2 17.4 22.8 20.7 3.5 0.0 18.3 10.0 22.6 16.4
7 8.2 1.4 6.2 4.7 17.4 18.3 0.0 10.0 5.7 3.2
8 4.9 9.2 15.0 13.0 9.2 10.0 10.0 0.0 14.8 8.6
9 12.5 6.5 2.4 2.4 21.9 22.6 5.7 14.8 0.0 7.1
10 6.8 2.4 6.9 5.3 15.6 16.4 3.2 8.6 7.1 0.0
The smaller the value, the more similar the students’ score profile.
Lets’ find clusters and visualise them.
posthc <- hclust(d = dist.eucl, method = "ward.D2")
# cex: label size
fviz_dend(posthc, cex = 0.5)
Dalal, can you see a "story’ on the lowest level regarding the ‘triples’ 32/6/19, 43/4/40, and 27/12,/14? The numbers correspond to the row position in dataframe df (same as in the excel file, but the row with the variable names is not counted in the data frame).
and can you see a patern at the level where we have tree clusters?
You read the dendrogram from bottom to top, see here
Need to have a look at the differences in the clustering results once I understand the implications of standardisation more. If the items have differnt maximal scores, the standardisation is necessary in any case.
head(postdf_std, nrow=6)
POST.SCORE POSTP1Q1 POSTP1Q2 POSTP1Q3 POSTP1Q4 POSTP2Q1 POSTP2Q2 POSTP2Q3 POSTP2Q4
[1,] -0.4097621 -0.8976099 0.655584 -0.1578729 0.6175402 -0.06160671 -1.220028 0.1806402 -0.5387811
[2,] 0.4405713 -0.8976099 0.655584 -0.1578729 0.6175402 -0.06160671 1.070637 1.3677046 0.7004155
[3,] 1.1491825 1.0228578 0.655584 -0.1578729 0.6175402 1.35534758 1.070637 0.1806402 0.7004155
[4,] 0.8657380 1.0228578 0.655584 -0.1578729 0.6175402 -0.06160671 1.070637 0.1806402 0.7004155
[5,] -1.6852622 -1.8578437 -1.806201 -1.9734109 0.6175402 -1.47856100 -1.220028 -1.0064242 0.7004155
[6,] -1.8269845 -1.3777268 -1.806201 -0.1578729 -1.8526207 -1.47856100 -1.220028 0.1806402 -1.7779778
dist.eucl <- dist(postdf_std, method = "euclidean")
posthc <- hclust(d = dist.eucl, method = "ward.D2")
fviz_dend(posthc, cex = 0.5)