Assumes libraries tidyverse, descriptr, gridExtra
df_raw = read_excel("S1_Pre_Post_full.xlsx")
Rescaling needed: * P1Q1 max 6, Q2 max 4, Q3 max 3, Q4 max 2, total max 15 * P2Q1 max 4, Q2 max 4, Q3 max 3, Q4 max 2, total max 13 * TRANSFER1 and 2 max 10 each.
Instead of the absolute scores we need the percentage in terms of the maximum score. We can use a scale from 0 to 10 with integer values.
df <- df_raw
df$ROLE <- factor(df$ROLE, levels = c(1,9), labels = c("tutor_first", "tutee_first"))
df$Pair <- factor(df$Pair)
# Pretest Scatterplot
df$P1Q1 <-as.integer(round(df$P1Q1/6.0 * 10, digits = 0))
df$P1Q2 <-as.integer(round(df$P1Q2/4.0 * 10, digits = 0))
df$P1Q3 <-as.integer(round(df$P1Q3/3.0 * 10, digits = 0))
df$P1Q4 <-as.integer(round(df$P1Q4/2.0 * 10, digits = 0))
# Pretest BWD
df$P2Q1 <-as.integer(round(df$P2Q1/4.0 * 10, digits = 0))
df$P2Q2 <-as.integer(round(df$P2Q2/4.0 * 10, digits = 0))
df$P2Q3 <-as.integer(round(df$P2Q3/3.0 * 10, digits = 0))
df$P2Q4 <-as.integer(round(df$P2Q4/2.0 * 10, digits = 0))
# Post-test Scatterplot
df$POSTP1Q1 <-as.integer(round(df$POSTP1Q1/6.0 * 10, digits = 0))
df$POSTP1Q2 <-as.integer(round(df$POSTP1Q2/4.0 * 10, digits = 0))
df$POSTP1Q3 <-as.integer(round(df$POSTP1Q3/3.0 * 10, digits = 0))
df$POSTP1Q4 <-as.integer(round(df$POSTP1Q4/2.0 * 10, digits = 0))
# Post-test BWD
df$POSTP2Q1 <-as.integer(round(df$POSTP2Q1/4.0 * 10, digits = 0))
df$POSTP2Q2 <-as.integer(round(df$POSTP2Q2/4.0 * 10, digits = 0))
df$POSTP2Q3 <-as.integer(round(df$POSTP2Q3/3.0 * 10, digits = 0))
df$POSTP2Q4 <-as.integer(round(df$POSTP2Q4/2.0 * 10, digits = 0))
Now we need to re-compute the marginal scores. Let’s first drop the old columns:
df <- select(df, -c("PRE-SCORE", "POST-SCORE"))
And now compute the new marginal scores:
df <- mutate(df, PRESCORE = P1Q1 + P1Q2 + P1Q3 + P1Q4 + P2Q1 + P2Q2 + P2Q3 + P2Q4)
df <- mutate(df, POSTSCORE = POSTP1Q1 + POSTP1Q2 + POSTP1Q3 + POSTP1Q4 + POSTP2Q1 + POSTP2Q2 + POSTP2Q3 + POSTP2Q4)
# average scores:
df <- mutate(df, PREAVG = PRESCORE/8)
df <- mutate(df, POSTAVG = POSTSCORE/8)
I think we are ready now for the analysis.
The principe maximal score in the test is 80.
ds_summary_stats(df,PRESCORE)
─────────────────────────── Variable: PRESCORE ───────────────────────────
Univariate Analysis
N 16.00 Variance 269.20
Missing 0.00 Std Deviation 16.41
Mean 31.44 Range 56.00
Median 26.50 Interquartile Range 19.75
Mode 21.00 Uncorrected SS 19851.00
Trimmed Mean 31.44 Corrected SS 4037.94
Skewness 0.77 Coeff Variation 52.19
Kurtosis -0.08 Std Error Mean 4.10
Quantiles
Quantile Value
Max 66.00
99% 65.10
95% 61.50
90% 53.50
Q3 40.50
Median 26.50
Q1 20.75
10% 14.50
5% 10.75
1% 10.15
Min 10.00
Extreme Values
Low High
Obs Value Obs Value
7 10 14 66
6 11 15 60
13 18 9 47
16 20 12 42
2 21 3 40
ggplot(df, aes(PRESCORE)) +
geom_histogram(bins = 6)
ggplot(df, aes(x = 1, y = PRESCORE)) +
geom_boxplot() +
scale_x_continuous(breaks = NULL) +
theme(axis.title.x = element_blank())
To interpret changes due to ROLE later, is there a differnce in the pre-test between students that subseqently were in the tutor_first or tutee_first role?
ggplot(df, aes(x = ROLE, y = PRESCORE)) +
geom_boxplot() +
xlab("Tutor role")
While the tutor_first is slightly better, this is likely random. A t-test agrees, with p greater than 0.05.
t.test(df$PRESCORE ~df$ROLE)
Welch Two Sample t-test
data: df$PRESCORE by df$ROLE
t = -1.6585, df = 11.58, p-value = 0.124
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-29.85765 4.10765
sample estimates:
mean in group tutor_first mean in group tutee_first
25.000 37.875
The non-parametric Wilcoxon test further confirms that there is no significant difference between the two groups:
wilcox.test(df$PRESCORE ~df$ROLE)
cannot compute exact p-value with ties
Wilcoxon rank sum test with continuity correction
data: df$PRESCORE by df$ROLE
W = 20.5, p-value = 0.2476
alternative hypothesis: true location shift is not equal to 0
ds_summary_stats(df,POSTSCORE)
─────────────────────────── Variable: POSTSCORE ──────────────────────────
Univariate Analysis
N 16.00 Variance 164.25
Missing 0.00 Std Deviation 12.82
Mean 58.62 Range 38.00
Median 59.00 Interquartile Range 20.50
Mode 75.00 Uncorrected SS 57454.00
Trimmed Mean 58.62 Corrected SS 2463.75
Skewness -0.08 Coeff Variation 21.86
Kurtosis -1.28 Std Error Mean 3.20
Quantiles
Quantile Value
Max 77.00
99% 76.70
95% 75.50
90% 75.00
Q3 69.00
Median 59.00
Q1 48.50
10% 41.50
5% 39.75
1% 39.15
Min 39.00
Extreme Values
Low High
Obs Value Obs Value
2 39 5 77
10 40 14 75
13 43 15 75
4 47 1 72
12 49 3 68
# ggplot(df, aes(POSTSCORE)) + geom_bar()
ggplot(df, aes(POSTSCORE)) +
geom_histogram(bins = 10)
Check:we’d like to know if the cases with the low values are the same from pre to post test. That would indicate non-engagment.
ggplot(df, aes(x = 1, y = POSTSCORE)) +
geom_boxplot() +
scale_x_continuous(breaks = NULL) +
theme(axis.title.x = element_blank())
By role:
ggplot(df, aes(x = ROLE, y = POSTSCORE)) +
geom_boxplot() +
xlab("Tutor role")
The difference between the two conditions is marginal, by inspection, also keeping in mind that the pre-test scores where sliglyt elavated for the tutor_first condition. A test reveals no significant difference.
t.test(df$POSTSCORE ~df$ROLE)
Welch Two Sample t-test
data: df$POSTSCORE by df$ROLE
t = 0.65035, df = 13.989, p-value = 0.526
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.767117 18.267117
sample estimates:
mean in group tutor_first mean in group tutee_first
60.75 56.50
In the further analysis we treat the two groups as comparable.
The intervention was clearly effective:
t.test(df$POSTSCORE, df$PRESCORE, paired=T)
Paired t-test
data: df$POSTSCORE and df$PRESCORE
t = 7.6385, df = 15, p-value = 1.515e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
19.60107 34.77393
sample estimates:
mean of the differences
27.1875
Looking at the individual gain scores, we see that all of them are positive, though with considerable variation.
df$gain <- df$POSTSCORE - df$PRESCORE
df$gain
[1] 35 18 28 25 42 44 55 39 13 11 37 7 25 9 15 32
ds_summary_stats(df, gain)
───────────────────────────── Variable: gain ─────────────────────────────
Univariate Analysis
N 16.00 Variance 202.70
Missing 0.00 Std Deviation 14.24
Mean 27.19 Range 48.00
Median 26.50 Interquartile Range 23.00
Mode 25.00 Uncorrected SS 14867.00
Trimmed Mean 27.19 Corrected SS 3040.44
Skewness 0.23 Coeff Variation 52.37
Kurtosis -0.86 Std Error Mean 3.56
Quantiles
Quantile Value
Max 55.00
99% 53.35
95% 46.75
90% 43.00
Q3 37.50
Median 26.50
Q1 14.50
10% 10.00
5% 8.50
1% 7.30
Min 7.00
Extreme Values
Low High
Obs Value Obs Value
12 7 7 55
14 9 6 44
10 11 5 42
9 13 8 39
15 15 11 37
Strong learning gains:
t.test(df$POSTAVG, df$PREAVG, paired=T)
Paired t-test
data: df$POSTAVG and df$PREAVG
t = 7.6385, df = 15, p-value = 1.515e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.450134 4.346741
sample estimates:
mean of the differences
3.398438
df$gain_avg <- df$POSTAVG - df$PREAVG
df$gain_avg
[1] 4.375 2.250 3.500 3.125 5.250 5.500 6.875 4.875 1.625 1.375 4.625
[12] 0.875 3.125 1.125 1.875 4.000
ds_summary_stats(df,gain_avg)
─────────────────────────── Variable: gain_avg ───────────────────────────
Univariate Analysis
N 16.00 Variance 3.17
Missing 0.00 Std Deviation 1.78
Mean 3.40 Range 6.00
Median 3.31 Interquartile Range 2.88
Mode 3.12 Uncorrected SS 232.30
Trimmed Mean 3.40 Corrected SS 47.51
Skewness 0.23 Coeff Variation 52.37
Kurtosis -0.86 Std Error Mean 0.44
Quantiles
Quantile Value
Max 6.88
99% 6.67
95% 5.84
90% 5.38
Q3 4.69
Median 3.31
Q1 1.81
10% 1.25
5% 1.06
1% 0.91
Min 0.88
Extreme Values
Low High
Obs Value Obs Value
12 0.875 7 6.875
14 1.125 6 5.5
10 1.375 5 5.25
9 1.625 8 4.875
15 1.875 11 4.625
ds_summary_stats(df,PREAVG)
──────────────────────────── Variable: PREAVG ────────────────────────────
Univariate Analysis
N 16.00 Variance 4.21
Missing 0.00 Std Deviation 2.05
Mean 3.93 Range 7.00
Median 3.31 Interquartile Range 2.47
Mode 2.62 Uncorrected SS 310.17
Trimmed Mean 3.93 Corrected SS 63.09
Skewness 0.77 Coeff Variation 52.19
Kurtosis -0.08 Std Error Mean 0.51
Quantiles
Quantile Value
Max 8.25
99% 8.14
95% 7.69
90% 6.69
Q3 5.06
Median 3.31
Q1 2.59
10% 1.81
5% 1.34
1% 1.27
Min 1.25
Extreme Values
Low High
Obs Value Obs Value
7 1.25 14 8.25
6 1.375 15 7.5
13 2.25 9 5.875
16 2.5 12 5.25
2 2.625 3 5
ggplot(df, aes(PREAVG)) +
geom_histogram(bins = 8)
ggplot(df, aes(x = 1, y = PREAVG)) +
geom_boxplot() +
scale_x_continuous(breaks = NULL) +
theme(axis.title.x = element_blank())
To interpret changes due to ROLE later, is there a differnce in the pre-test between students that subseqently were in the tutor_first or tutee_first role?
ggplot(df, aes(x = ROLE, y = PREAVG)) +
geom_boxplot() +
xlab("Tutor role")
While the tutee_first is slightly better, this is likely random. A t-test agrees, with p greater than 0.05.
t.test(df$PREAVG ~df$ROLE)
Welch Two Sample t-test
data: df$PREAVG by df$ROLE
t = -1.6585, df = 11.58, p-value = 0.124
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.7322062 0.5134562
sample estimates:
mean in group tutor_first mean in group tutee_first
3.125000 4.734375
The non-parametric Wilcoxon test further confirms that there is no significant difference between the two groups:
wilcox.test(df$PREAVG ~df$ROLE)
cannot compute exact p-value with ties
Wilcoxon rank sum test with continuity correction
data: df$PREAVG by df$ROLE
W = 20.5, p-value = 0.2476
alternative hypothesis: true location shift is not equal to 0
ds_summary_stats(df,POSTAVG)
──────────────────────────── Variable: POSTAVG ───────────────────────────
Univariate Analysis
N 16.00 Variance 2.57
Missing 0.00 Std Deviation 1.60
Mean 7.33 Range 4.75
Median 7.38 Interquartile Range 2.56
Mode 9.38 Uncorrected SS 897.72
Trimmed Mean 7.33 Corrected SS 38.50
Skewness -0.08 Coeff Variation 21.86
Kurtosis -1.28 Std Error Mean 0.40
Quantiles
Quantile Value
Max 9.62
99% 9.59
95% 9.44
90% 9.38
Q3 8.62
Median 7.38
Q1 6.06
10% 5.19
5% 4.97
1% 4.89
Min 4.88
Extreme Values
Low High
Obs Value Obs Value
2 4.875 5 9.625
10 5 14 9.375
13 5.375 15 9.375
4 5.875 1 9
12 6.125 3 8.5
ggplot(df, aes(POSTAVG)) +
geom_histogram(bins = 10)
ggplot(df, aes(x = 1, y = POSTAVG)) +
geom_boxplot() +
scale_x_continuous(breaks = NULL) +
theme(axis.title.x = element_blank())
is there a differnce between students that subseqently were in the tutor_first or tutee_first role?
ggplot(df, aes(x = ROLE, y = POSTAVG)) +
geom_boxplot() +
xlab("Tutor role")
While the tutor_first is slightly better, this is likely random. A t-test agrees, with p greater than 0.05.
t.test(df$POSTAVG ~df$ROLE)
Welch Two Sample t-test
data: df$POSTAVG by df$ROLE
t = 0.65035, df = 13.989, p-value = 0.526
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.22089 2.28339
sample estimates:
mean in group tutor_first mean in group tutee_first
7.59375 7.06250
The non-parametric Wilcoxon test further confirms that there is no significant difference between the two groups:
wilcox.test(df$POSTAVG ~df$ROLE)
cannot compute exact p-value with ties
Wilcoxon rank sum test with continuity correction
data: df$POSTAVG by df$ROLE
W = 38, p-value = 0.5632
alternative hypothesis: true location shift is not equal to 0
df$P1Q1
[1] 3 2 3 0 0 0 0 5 0 0 2 3 2 3 3 0
There were 16 warnings (use warnings() to see them)
df$POSTP1Q1
[1] 10 5 8 3 10 3 5 7 8 5 8 3 8 5 10 8
ds_summary_stats(df,P1Q1, POSTP1Q1 )
───────────────────────────── Variable: P1Q1 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 2.65
Missing 0.00 Std Deviation 1.63
Mean 1.62 Range 5.00
Median 2.00 Interquartile Range 3.00
Mode 0.00 Uncorrected SS 82.00
Trimmed Mean 1.62 Corrected SS 39.75
Skewness 0.38 Coeff Variation 100.18
Kurtosis -0.92 Std Error Mean 0.41
Quantiles
Quantile Value
Max 5.00
99% 4.70
95% 3.50
90% 3.00
Q3 3.00
Median 2.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
4 0 8 5
5 0 1 3
6 0 3 3
7 0 12 3
9 0 14 3
─────────────────────────── Variable: POSTP1Q1 ───────────────────────────
Univariate Analysis
N 16.00 Variance 6.25
Missing 0.00 Std Deviation 2.50
Mean 6.62 Range 7.00
Median 7.50 Interquartile Range 3.00
Mode 8.00 Uncorrected SS 796.00
Trimmed Mean 6.62 Corrected SS 93.75
Skewness -0.15 Coeff Variation 37.74
Kurtosis -1.28 Std Error Mean 0.62
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 8.00
Median 7.50
Q1 5.00
10% 3.00
5% 3.00
1% 3.00
Min 3.00
Extreme Values
Low High
Obs Value Obs Value
4 3 1 10
6 3 5 10
12 3 15 10
2 5 3 8
7 5 9 8
boxplot(df$P1Q1, data = df)
boxplot(df$POSTP1Q1, data = df)
ggplot(df, aes(P1Q1)) + geom_bar()
ggplot(df, aes(POSTP1Q1)) + geom_bar()
df$P1Q2
[1] 0 0 0 0 0 0 0 0 10 0 0 0 0 10 10 0
df$POSTP1Q2
[1] 8 0 10 5 10 10 10 12 10 0 10 5 2 10 10 10
ds_summary_stats(df,P1Q2, POSTP1Q2)
───────────────────────────── Variable: P1Q2 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 16.25
Missing 0.00 Std Deviation 4.03
Mean 1.88 Range 10.00
Median 0.00 Interquartile Range 0.00
Mode 0.00 Uncorrected SS 300.00
Trimmed Mean 1.88 Corrected SS 243.75
Skewness 1.77 Coeff Variation 214.99
Kurtosis 1.28 Std Error Mean 1.01
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 0.00
Median 0.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 9 10
2 0 14 10
3 0 15 10
4 0 1 0
5 0 2 0
─────────────────────────── Variable: POSTP1Q2 ───────────────────────────
Univariate Analysis
N 16.00 Variance 15.45
Missing 0.00 Std Deviation 3.93
Mean 7.62 Range 12.00
Median 10.00 Interquartile Range 5.00
Mode 10.00 Uncorrected SS 1162.00
Trimmed Mean 7.62 Corrected SS 231.75
Skewness -1.12 Coeff Variation 51.55
Kurtosis -0.16 Std Error Mean 0.98
Quantiles
Quantile Value
Max 12.00
99% 11.70
95% 10.50
90% 10.00
Q3 10.00
Median 10.00
Q1 5.00
10% 1.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
2 0 8 12
10 0 3 10
13 2 5 10
4 5 6 10
12 5 7 10
ggplot(df, aes(P1Q2)) + geom_bar()
ggplot(df, aes(POSTP1Q2)) + geom_bar()
df$P1Q3
[1] 7 7 10 7 3 3 0 7 7 10 7 10 3 10 10 7
df$POSTP1Q3
[1] 7 7 10 7 7 7 3 7 7 3 3 7 7 10 10 7
ds_summary_stats(df,P1Q3, POSTP1Q3 )
───────────────────────────── Variable: P1Q3 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 9.40
Missing 0.00 Std Deviation 3.07
Mean 6.75 Range 10.00
Median 7.00 Interquartile Range 4.00
Mode 7.00 Uncorrected SS 870.00
Trimmed Mean 6.75 Corrected SS 141.00
Skewness -0.78 Coeff Variation 45.42
Kurtosis -0.07 Std Error Mean 0.77
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 7.00
Q1 6.00
10% 3.00
5% 2.25
1% 0.45
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
7 0 3 10
5 3 10 10
6 3 12 10
13 3 14 10
1 7 15 10
─────────────────────────── Variable: POSTP1Q3 ───────────────────────────
Univariate Analysis
N 16.00 Variance 4.96
Missing 0.00 Std Deviation 2.23
Mean 6.81 Range 7.00
Median 7.00 Interquartile Range 0.00
Mode 7.00 Uncorrected SS 817.00
Trimmed Mean 6.81 Corrected SS 74.44
Skewness -0.48 Coeff Variation 32.70
Kurtosis 0.11 Std Error Mean 0.56
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 7.00
Median 7.00
Q1 7.00
10% 3.00
5% 3.00
1% 3.00
Min 3.00
Extreme Values
Low High
Obs Value Obs Value
7 3 3 10
10 3 14 10
11 3 15 10
1 7 1 7
2 7 2 7
boxplot(df$P1Q3, data = df)
boxplot(df$POSTP1Q3, data = df)
ggplot(df, aes(P1Q3)) + geom_bar()
ggplot(df, aes(POSTP1Q3)) + geom_bar()
df$P1Q4
[1] 10 5 10 5 10 0 0 0 0 5 0 10 0 10 10 10
df$POSTP1Q4
[1] 10 10 10 10 10 10 10 10 0 5 0 10 5 10 10 10
ds_summary_stats(df,P1Q4, POSTP1Q4 )
───────────────────────────── Variable: P1Q4 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 21.56
Missing 0.00 Std Deviation 4.64
Mean 5.31 Range 10.00
Median 5.00 Interquartile Range 10.00
Mode 10.00 Uncorrected SS 775.00
Trimmed Mean 5.31 Corrected SS 323.44
Skewness -0.14 Coeff Variation 87.41
Kurtosis -1.96 Std Error Mean 1.16
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 5.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
6 0 1 10
7 0 3 10
8 0 5 10
9 0 12 10
11 0 14 10
─────────────────────────── Variable: POSTP1Q4 ───────────────────────────
Univariate Analysis
N 16.00 Variance 12.92
Missing 0.00 Std Deviation 3.59
Mean 8.12 Range 10.00
Median 10.00 Interquartile Range 1.25
Mode 10.00 Uncorrected SS 1250.00
Trimmed Mean 8.12 Corrected SS 193.75
Skewness -1.73 Coeff Variation 44.23
Kurtosis 1.70 Std Error Mean 0.90
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 10.00
Q1 8.75
10% 2.50
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
9 0 1 10
11 0 2 10
10 5 3 10
13 5 4 10
1 10 5 10
boxplot(df$P1Q4, data = df)
boxplot(df$POSTP1Q4, data = df)
ggplot(df, aes(P1Q4)) + geom_bar()
ggplot(df, aes(POSTP1Q4)) + geom_bar()
df$P2Q1
[1] 0 0 2 0 0 0 2 0 5 2 0 2 0 8 2 0
df$POSTP2Q1
[1] 10 10 10 5 10 8 10 10 10 5 10 2 2 10 5 8
ds_summary_stats(df,P2Q1, POSTP2Q1 )
───────────────────────────── Variable: P2Q1 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 5.06
Missing 0.00 Std Deviation 2.25
Mean 1.44 Range 8.00
Median 0.00 Interquartile Range 2.00
Mode 0.00 Uncorrected SS 109.00
Trimmed Mean 1.44 Corrected SS 75.94
Skewness 2.02 Coeff Variation 156.52
Kurtosis 4.28 Std Error Mean 0.56
Quantiles
Quantile Value
Max 8.00
99% 7.55
95% 5.75
90% 3.50
Q3 2.00
Median 0.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 14 8
2 0 9 5
4 0 3 2
5 0 7 2
6 0 10 2
─────────────────────────── Variable: POSTP2Q1 ───────────────────────────
Univariate Analysis
N 16.00 Variance 8.96
Missing 0.00 Std Deviation 2.99
Mean 7.81 Range 8.00
Median 10.00 Interquartile Range 5.00
Mode 10.00 Uncorrected SS 1111.00
Trimmed Mean 7.81 Corrected SS 134.44
Skewness -1.04 Coeff Variation 38.32
Kurtosis -0.39 Std Error Mean 0.75
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 10.00
Q1 5.00
10% 3.50
5% 2.00
1% 2.00
Min 2.00
Extreme Values
Low High
Obs Value Obs Value
12 2 1 10
13 2 2 10
4 5 3 10
10 5 5 10
15 5 7 10
boxplot(df$P2Q1, data = df)
boxplot(df$POSTP2Q1, data = df)
ggplot(df, aes(P2Q1)) + geom_bar()
ggplot(df, aes(POSTP2Q1)) + geom_bar()
df$P2Q2
[1] 0 0 0 0 5 0 0 0 5 0 0 0 0 10 5 0
df$POSTP2Q2
[1] 10 0 10 10 10 10 10 10 8 5 10 5 2 10 10 2
ds_summary_stats(df,P2Q2, POSTP2Q2 )
───────────────────────────── Variable: P2Q2 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 9.06
Missing 0.00 Std Deviation 3.01
Mean 1.56 Range 10.00
Median 0.00 Interquartile Range 1.25
Mode 0.00 Uncorrected SS 175.00
Trimmed Mean 1.56 Corrected SS 135.94
Skewness 1.89 Coeff Variation 192.67
Kurtosis 3.03 Std Error Mean 0.75
Quantiles
Quantile Value
Max 10.00
99% 9.25
95% 6.25
90% 5.00
Q3 1.25
Median 0.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
1 0 14 10
2 0 5 5
3 0 9 5
4 0 15 5
6 0 1 0
─────────────────────────── Variable: POSTP2Q2 ───────────────────────────
Univariate Analysis
N 16.00 Variance 12.78
Missing 0.00 Std Deviation 3.58
Mean 7.62 Range 10.00
Median 10.00 Interquartile Range 5.00
Mode 10.00 Uncorrected SS 1122.00
Trimmed Mean 7.62 Corrected SS 191.75
Skewness -1.17 Coeff Variation 46.89
Kurtosis -0.18 Std Error Mean 0.89
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 10.00
Q1 5.00
10% 2.00
5% 1.50
1% 0.30
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
2 0 1 10
13 2 3 10
16 2 4 10
10 5 5 10
12 5 6 10
boxplot(df$P2Q2, data = df)
boxplot(df$POSTP2Q2, data = df)
ggplot(df, aes(P2Q2)) + geom_bar()
ggplot(df, aes(POSTP2Q2)) + geom_bar()
df$P2Q3
[1] 7 7 10 10 7 3 3 7 10 7 7 7 3 10 10 3
df$POSTP2Q3
[1] 7 7 10 7 10 7 7 7 7 7 7 7 7 10 10 7
ds_summary_stats(df,P2Q3, POSTP2Q3 )
───────────────────────────── Variable: P2Q3 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 7.26
Missing 0.00 Std Deviation 2.69
Mean 6.94 Range 7.00
Median 7.00 Interquartile Range 4.00
Mode 7.00 Uncorrected SS 879.00
Trimmed Mean 6.94 Corrected SS 108.94
Skewness -0.39 Coeff Variation 38.85
Kurtosis -1.06 Std Error Mean 0.67
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 7.00
Q1 6.00
10% 3.00
5% 3.00
1% 3.00
Min 3.00
Extreme Values
Low High
Obs Value Obs Value
6 3 3 10
7 3 4 10
13 3 9 10
16 3 14 10
1 7 15 10
─────────────────────────── Variable: POSTP2Q3 ───────────────────────────
Univariate Analysis
N 16.00 Variance 1.80
Missing 0.00 Std Deviation 1.34
Mean 7.75 Range 3.00
Median 7.00 Interquartile Range 0.75
Mode 7.00 Uncorrected SS 988.00
Trimmed Mean 7.75 Corrected SS 27.00
Skewness 1.28 Coeff Variation 17.31
Kurtosis -0.44 Std Error Mean 0.34
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 7.75
Median 7.00
Q1 7.00
10% 7.00
5% 7.00
1% 7.00
Min 7.00
Extreme Values
Low High
Obs Value Obs Value
1 7 3 10
2 7 5 10
4 7 14 10
6 7 15 10
7 7 1 7
boxplot(df$P2Q3, data = df)
boxplot(df$POSTP2Q3, data = df)
ggplot(df, aes(P2Q3)) + geom_bar()
ggplot(df, aes(POSTP2Q3)) + geom_bar()
df$P2Q4
[1] 10 0 5 0 10 5 5 5 10 5 5 10 10 5 10 0
df$POSTP2Q4
[1] 10 0 0 0 10 0 10 0 10 10 10 10 10 10 10 0
ds_summary_stats(df,P2Q4, POSTP2Q4 )
───────────────────────────── Variable: P2Q4 ─────────────────────────────
Univariate Analysis
N 16.00 Variance 14.06
Missing 0.00 Std Deviation 3.75
Mean 5.94 Range 10.00
Median 5.00 Interquartile Range 5.00
Mode 5.00 Uncorrected SS 775.00
Trimmed Mean 5.94 Corrected SS 210.94
Skewness -0.33 Coeff Variation 63.16
Kurtosis -1.00 Std Error Mean 0.94
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 5.00
Q1 5.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
2 0 1 10
4 0 5 10
16 0 9 10
3 5 12 10
6 5 13 10
─────────────────────────── Variable: POSTP2Q4 ───────────────────────────
Univariate Analysis
N 16.00 Variance 25.00
Missing 0.00 Std Deviation 5.00
Mean 6.25 Range 10.00
Median 10.00 Interquartile Range 10.00
Mode 10.00 Uncorrected SS 1000.00
Trimmed Mean 6.25 Corrected SS 375.00
Skewness -0.57 Coeff Variation 80.00
Kurtosis -1.93 Std Error Mean 1.25
Quantiles
Quantile Value
Max 10.00
99% 10.00
95% 10.00
90% 10.00
Q3 10.00
Median 10.00
Q1 0.00
10% 0.00
5% 0.00
1% 0.00
Min 0.00
Extreme Values
Low High
Obs Value Obs Value
2 0 1 10
3 0 5 10
4 0 7 10
6 0 9 10
8 0 10 10
boxplot(df$P2Q4, data = df)
boxplot(df$POSTP2Q4, data = df)
ggplot(df, aes(P2Q4)) + geom_bar()
ggplot(df, aes(POSTP2Q4)) + geom_bar()
Let’s concentrate on the posttest items because we assume more or less zero knowledge in pre-test.
postdf <- df %>% select(starts_with("POST"))
There were 42 warnings (use warnings() to see them)
Note: We use the Stdcode and the factors here was well. Perhaps this can be done more elegantly?
Look at the correlations of the post-test items. rcorr() is from the package Hmisc.
rcorr(as.matrix(postdf))
POSTP1Q1 POSTP1Q2 POSTP1Q3 POSTP1Q4 POSTP2Q1 POSTP2Q2 POSTP2Q3
POSTP1Q1 1.00 0.35 0.20 -0.19 0.28 0.13 0.39
POSTP1Q2 0.35 1.00 0.25 0.09 0.54 0.71 0.36
POSTP1Q3 0.20 0.25 1.00 0.45 -0.02 0.11 0.65
POSTP1Q4 -0.19 0.09 0.45 1.00 0.00 0.07 0.31
POSTP2Q1 0.28 0.54 -0.02 0.00 1.00 0.39 0.19
POSTP2Q2 0.13 0.71 0.11 0.07 0.39 1.00 0.40
POSTP2Q3 0.39 0.36 0.65 0.31 0.19 0.40 1.00
POSTP2Q4 0.31 -0.04 -0.25 -0.42 -0.18 0.14 0.15
POSTSCORE 0.55 0.79 0.42 0.24 0.51 0.75 0.70
POSTAVG 0.55 0.79 0.42 0.24 0.51 0.75 0.70
POSTP2Q4 POSTSCORE POSTAVG
POSTP1Q1 0.31 0.55 0.55
POSTP1Q2 -0.04 0.79 0.79
POSTP1Q3 -0.25 0.42 0.42
POSTP1Q4 -0.42 0.24 0.24
POSTP2Q1 -0.18 0.51 0.51
POSTP2Q2 0.14 0.75 0.75
POSTP2Q3 0.15 0.70 0.70
POSTP2Q4 1.00 0.29 0.29
POSTSCORE 0.29 1.00 1.00
POSTAVG 0.29 1.00 1.00
n= 16
P
POSTP1Q1 POSTP1Q2 POSTP1Q3 POSTP1Q4 POSTP2Q1 POSTP2Q2 POSTP2Q3
POSTP1Q1 0.1824 0.4531 0.4698 0.2866 0.6250 0.1380
POSTP1Q2 0.1824 0.3498 0.7445 0.0297 0.0023 0.1704
POSTP1Q3 0.4531 0.3498 0.0782 0.9542 0.6912 0.0062
POSTP1Q4 0.4698 0.7445 0.0782 0.9886 0.7929 0.2409
POSTP2Q1 0.2866 0.0297 0.9542 0.9886 0.1405 0.4887
POSTP2Q2 0.6250 0.0023 0.6912 0.7929 0.1405 0.1288
POSTP2Q3 0.1380 0.1704 0.0062 0.2409 0.4887 0.1288
POSTP2Q4 0.2480 0.8761 0.3566 0.1077 0.4958 0.6055 0.5816
POSTSCORE 0.0284 0.0003 0.1096 0.3768 0.0450 0.0009 0.0023
POSTAVG 0.0284 0.0003 0.1096 0.3768 0.0450 0.0009 0.0023
POSTP2Q4 POSTSCORE POSTAVG
POSTP1Q1 0.2480 0.0284 0.0284
POSTP1Q2 0.8761 0.0003 0.0003
POSTP1Q3 0.3566 0.1096 0.1096
POSTP1Q4 0.1077 0.3768 0.3768
POSTP2Q1 0.4958 0.0450 0.0450
POSTP2Q2 0.6055 0.0009 0.0009
POSTP2Q3 0.5816 0.0023 0.0023
POSTP2Q4 0.2782 0.2782
POSTSCORE 0.2782 0.0000
POSTAVG 0.2782 0.0000
Any P LEQ0.05 can be considered significant.
We can think of a students’ scores in the post-test as a kind of profile, and ask if there are clusters of students with similar profiles. This is what a cluster analysis lets us find out.
We may have to think about what the post-test values mean and if a standardisation is required. We might need to standardise if the maximal scores are different between items.
Using Euclidian distance, we compute the distance between the students:
dist.eucl <- dist(postdf, method = "euclidean")
The first 10 students’ distances are:
round(as.matrix(dist.eucl)[1:10, 1:10], 1)
The smaller the value, the more similar the students’ score profile.
Lets’ find clusters and visualise them.
posthc <- hclust(d = dist.eucl, method = "ward.D2")
# cex: label size
fviz_dend(posthc, cex = 0.5)
Dalal, the students are the row numbers in the excel table minus 1 for variable names.
and can you see a patern at the level where we have tree clusters?
You read the dendrogram from bottom to top, see here
Need to have a look at the differences in the clustering results once I understand the implications of standardisation more. If the items have different maximal scores, the standardisation is necessary in any case. But we already rescaled items on 1-10 values, so I think at this stage that scalilng in the CA sense is not needed.
postdf_std <- scale(postdf)
head(postdf_std, nrow=6)
POSTP1Q1 POSTP1Q2 POSTP1Q3 POSTP1Q4 POSTP2Q1 POSTP2Q2
[1,] 1.35 0.0954041 0.08416878 0.5217063 0.73069053 0.6642653
[2,] -0.65 -1.9398833 0.08416878 0.5217063 0.73069053 -2.1326412
[3,] 0.55 0.6042259 1.43086919 0.5217063 0.73069053 0.6642653
[4,] -1.45 -0.6678287 0.08416878 0.5217063 -0.93945925 0.6642653
[5,] 1.35 0.6042259 0.08416878 0.5217063 0.73069053 0.6642653
[6,] -1.45 0.6042259 0.08416878 0.5217063 0.06263062 0.6642653
POSTP2Q3 POSTP2Q4 POSTSCORE POSTAVG
[1,] -0.559017 0.75 1.0436169 1.0436169
[2,] -0.559017 -1.25 -1.5312883 -1.5312883
[3,] 1.677051 -1.25 0.7315072 0.7315072
[4,] -0.559017 -1.25 -0.9070689 -0.9070689
[5,] 1.677051 0.75 1.4337541 1.4337541
[6,] -0.559017 -1.25 -0.2828494 -0.2828494
dist.eucl <- dist(postdf_std, method = "euclidean")
posthc <- hclust(d = dist.eucl, method = "ward.D2")
fviz_dend(posthc, cex = 0.5)