Motivation and research questions

We aim to understand the TBL pedagody, i.e., how it works, why it works, which aspects of learning it imroves, and which learning outcomes it helps to achieve. This will help us to know when TBL is appropriate, when TBL is not appropriate and when TBL is appropriate but requires scaffolding.

The challenges are

Randomized controlled experiments are hard or impossible to organize. Ideally, in order to know whether TBL is effective for learning course material, one would split students into experimental group and control group, experimental group will be taught via TBL and control group via lectures and tutorials, and then one will compare exam scores in the two groups. But this is not practical.
It is hard to measure certain aspects of learning. For instance, while content knowledge is directly measured by exam scores, knowledge retention can only be measured by exam scores collected long after the course is over and arranging such an exam is impractical. It is even harder to measure communication, leadership, and teamwork skills.

We need to find studies addressing these challenges

A strength of our study is the sheer amount of data.

Claims about TBL

There are some widely spread claims or beliefs about TBL that people seem to take for granted. The objective of our study is to scrutinize some of these claims.

We need to find studies with these claims and what kind of justification for these claims they provide. Are these claims justified by some established learning theories, by quantitative studies, or just anecdotal evidence?

TBL improves teamwork skills

One of such claims about TBL is that TBL hones students’ teamwork skills without scaffolding, i.e, students will learn to communicate effectively and work better as a team just if they are subjected to the TBL environment. We want to challenge this claim.

If this claim were true, then teams would be more effective as teams in the end of a term than they are in the beginning of a term. And this is measurable. If this were true and if all TBL modules had the same difficulty level, then we would see TRA scores increasing as time goes by, from module to module. Then TRA scores in the end of a term would be higher than TRA scores in the beginning of a term. However, in practice, difficulty level of questions changes over time in an untontrolled manner. In order to observe and measure the gain in TRA scores over time, we need to control for the difficulty level of questions.

We will construct statistical models predicting TRA scores from IRA scores: \[ \text{TRA}=f(\text{IRA})+\varepsilon \] IRA scores provide a measure of the difficulty level of questions. Thus the residual of the model, i.e., \[ \text{TRA}-f(\text{IRA}) \] is a measure of teams’ effectiveness. If teams’ effectiveness increases over time, then the claim is true. Moreover, if we have a linear model \[ \text{TRA}-f(\text{IRA})=\alpha+\beta t+\tilde{\varepsilon}, \] where \(t\) is time (module number), then \(\beta\) is the weelky gain in TRA score due to teams’ effectiveness. This \(\beta\) is the measure of teams’ effectiveness. Below we outline our results.

Our main finding is that, at least in our context (large undergraduate mathematics class), teams’ effectiveness increases very slowly. Thus if teamwork were the main objective, some scaffolding would be required - pure TBL is not enough to teach teamwork.

We need to find studies about teaching teamwork, learning theories and scaffolding. We need to provide some solution.

TBL is effective for learning content

Another claim about TBL is that TRA scores are almost always higher than IRA scores and this is evidence of effectivenes of TBL. We do not doubt that TRA scores are higher than IRA scores, but this is a trivial observation. Of course, they are. We doubt that getting high TRA scores helps students to learn material better individually and we do not think that the fact that TRA scores are higher than IRA scores supports the claim that TBL is more effective than lectures.

In order to test this claim, we need to identify effective teams and check whether students in effective teams progress more in individual learning than students in less effective teams. One way to do it is predicting students’ exam scores from their initial level, i.e., connstructing a statistical model \[ \text{EXAM}=f(\text{initial level})+\varepsilon \] Here, the residual of the model is a relative measure of progress in individual learning. Then, we can construct another model \[ \text{EXAM}-f(\text{initial level})=\alpha + \beta E+\tilde{\varepsilon}, \] where \(E\) is the team’s effectiveness. If \(\beta\) is postive, it can be interpreted as evidence in favour of the claim.

This is not done yet - we only have some simple preliminary findings. It may be better to do another study of this.

Data preprocessing

Raw data

The data have been collected in the course MH3110 - Ordinary Differential Equations over 3 years — 2017, 2018, and 2019. The main mode of content delivery was team-based learning, i.e., IRA (individual readiness assessment), TRA (team rediness assessment), and AE (application exercises). To create the dataset, we merged three years of students’ scores into one table.

We have recorded the cohort of each student - A, B, or C (different years), the team, and whether the team was formed by students themselves or randomized. For each student, the table contains their weekly IRA and TRA scores and their midterm test. Note that

AE scores have been removed since application exercises were conducted differently in different years. Mode, assessment, and even the number of application exercises were different.
week4 IRA/TRA scores have been removed because this TBL session was too easy and more than half of all teams got the full score.
week 9 IRA/TRA scores have been removed because this TBL session was introduced only in 2019 and it did not exist in 2017 and 2018.
week 12 IRA/TRA scores have been removed because in this TBL session, only IRA was conducted in 2018 and it was not even graded.
The rest of the TBL sessions (weeks 2, 3, 5, 6, 7, 8, 10, 11) were renamed to modules 1-8.

Midterm test scores cannot be compared across years because the midterm tests were substantially different. For instance, the midterm test in 2018 was probably much harder than in 2017 or 2019. However, midterm test scores for stududents within one team could be compared in the sense that if students A and B are in the same team and the test score of student A is higher than the test score of student B, then this can be interpreted in the sense that student A was more successfull in learning course material than student B.

Below is a sample of the raw data

library(ggplot2)

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang

library(tidyverse)

## Registered S3 method overwritten by 'rvest':
##   method            from
##   read_xml.response xml2

## -- Attaching packages ------------------------------------------------------- tidyverse 1.2.1 --

## v tibble  2.1.1       v purrr   0.3.2  
## v tidyr   0.8.3       v dplyr   0.8.0.1
## v readr   1.3.1       v stringr 1.4.0  
## v tibble  2.1.1       v forcats 0.4.0

## -- Conflicts ---------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

library(Matrix)

## 
## Attaching package: 'Matrix'

## The following object is masked from 'package:tidyr':
## 
##     expand

library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

library(e1071)

wide_data <- read.csv("mh3110_tbl_midterm_3years.csv")
wide_data$midterm_test <- as.numeric(as.character(wide_data$midterm_test))

## Warning: NAs introduced by coercion

head(wide_data)

Every record here is a student. Further, we have three types of cohort (years 2017, 2018, and 2019):

table(wide_data$cohort)

## 
##   A   B   C 
## 210 190 181

The number of teams is

length(unique(wide_data$fake_team))

## [1] 107

The number of teams per cohort

our_cohorts <- unique(wide_data$cohort)
data.frame(
  cohort = our_cohorts,
  num_teams = sapply(our_cohorts,
       function(x) length(unique(wide_data$fake_team[wide_data$cohort == x])))
)

Number of students in teams (first row - \(k\), second row - number of teams with \(k\) students)

table(table(wide_data$fake_team))

## 
##  4  5  6 
##  3 55 49

Data processing

We have created a new dataset where each row describes performance of one team in one module. Variables are IRA scores of the students who were in class (some of them could be NA if the student was absent or if the team has fewer than 6 students), the number of students present in class, top 3 IRA scores, and bottom 3 IRA scores of the team members.

We have removed a instances when fewer than 3 students from one team were present in class.

X <- read.csv("mh3110_tbl_midterm_3years_by_week.csv", 
              stringsAsFactors = FALSE)

#X$unit <- gsub("unit ", "", X$unit)
#X$unit <- as.numeric(X$unit)

head(X[ , -c(ncol(X)-1, ncol(X))])

Statistical models for TRA scores

5-variable models

We used symbolic regression to predict the TRA score from individual student scores. The predictors are the number of students present in class, highest IRA score, second highest IRA score, lowest IRA score, second lowest IRA score. Only 3 out of 5 appeared in the final model: highest IRA score, second highest IRA score, and second lowest IRA score: \[ \begin{array}{cccc} & \text{Complexity} & \text{1-}R^2 & \text{Function} \\ 1 & 31 & 0.425551 & 1.61044 \sqrt{\text{top}_2}+0.370283 \text{top}_1-\frac{23.8483}{\text{bottom}_2}+15.838 \\ 2 & 42 & 0.431512 & 0.0125789 \sqrt{\text{bottom}_2} \text{top}_1+0.312742 \text{top}_1+4.7228 \sqrt[3]{\text{top}_2}+8.7116 \\ \end{array} \]

The resulting \(R^2\) (explained variation) is

cor(X$TRA, X$predicted_TRA)^2

## [1] 0.5725489

We have recorded predictions of this model and the added value, i.e., the residual of the model (actual TRA minus the predicted TRA)

head(X[ , c("id", "team", "module", "TRA", "predicted_TRA", "added_value")])

Univariate models

We have also tried to use just the top score as a predictor. Below is the list of our models with the top score as the only predictor \[ \begin{array}{cccc} & \text{Complexity} & \text{1-}R^2 & \text{Function} \\ 1 & 27 & 0.466065 & 0.375512 \max \left(23.0903,\text{top}_1\right)-\frac{180.563}{\text{top}_1}+28.6088 \\ 2 & 31 & 0.467707 & -0.0128802 \text{top}_1{}^2+1.58583 \text{top}_1-8.84388\, +\frac{202.555}{\text{top}_1} \\ 3 & 31 & 0.46796 & -0.216013 \text{top}_1{}^{2.0499}+0.271857 \text{top}_1{}^2+19.5737 \\ \end{array} \]

Note that these models are worse by just 4% of \(R^2\).

Univariate linear model

In fact, it turns out that just the linear model is not really worse than non-linear models. Below is \(R^2\) for the linear model.

fit <- lm(data = X, TRA ~ top1)
1-summary.lm(fit)$r.squared

## [1] 0.4692366

It means that the main contributor to TRA is the top IRA score of all the students in that team on that week. Specifically, about 53% of variance in TRA is explained by the best individual score and the rest is mostly due to unobserved factors, like the specific nature of IRA/TRA questions on that day, and random noise rather than due to other students’ performance.

Study

Our real-life research question boils down to the following statistical question:

What factors are related to and influence the part of variance in TRA scores that is unexplained by the best IRA score in the team on that week?

We will create a new variable - gain. It equals the actual TRA score minus the TRA score as predicted by the best IRA score, i.e., the residual of the linear model.

X$gain <- X$TRA - predict(fit, X)
head(X)

Note that both “added_value” and “gain” are residuals of predictive models — “added_value” is the residual of the multivariable non-linear model and “gain” is the residual of the univariate linear model. The correlation between added_value and gain is

cor.test(X$added_value, X$gain)

## 
##  Pearson's product-moment correlation
## 
## data:  X$added_value and X$gain
## t = 94.404, df = 822, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9506756 0.9622598
## sample estimates:
##       cor 
## 0.9568462

Module

If teamwork gets better over time, then we should see that varaince in TRA score can be explained by the module. Specifically, we should see gain increasing from module 1 to module 8.

However, this does not happen, at least it is not easy to see with a naked eye:

ggplot(data = X[ , ], 
       aes(x = module, y = gain, group = module)) +
  geom_boxplot()

This may be due to confounding effect of the team formation. It is possible that teams formed by students themselves do not get better in teamwork since they already know each other. IF that is the case, then we should see increasing gain for teams that were formed randomly. It does not happen:

ggplot(data = X[ , ], 
       aes(x = module, y = gain, group = module)) +
  facet_wrap(. ~ team_type) +
  geom_boxplot()

Or may be we have a confounding effect of the cohort. Let us plot the gain as a function of module by cohort

ggplot(data = X[ , ], 
       aes(x = module, y = gain, group = module)) +
  facet_wrap(. ~ cohort) +
  geom_boxplot()

Statistical modelling

Linear models

Here we are testing the hypothesis that the gain increases over time. If this were the case, then the coefficient at the gain in linear models would be positive.

Just the module: positive, insignificant

summary.lm(lm(data = X, gain ~ module))

## 
## Call:
## lm(formula = gain ~ module, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.2546  -1.9621   0.5695   1.0748  14.2073 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.36756    0.25353  -1.450    0.148
## module       0.08235    0.05060   1.627    0.104
## 
## Residual standard error: 3.307 on 822 degrees of freedom
## Multiple R-squared:  0.003212,   Adjusted R-squared:  0.001999 
## F-statistic: 2.649 on 1 and 822 DF,  p-value: 0.104

Module and second best student result. Positive and siginficant at 0.1 level of significance

summary.lm(lm(data = X, gain ~ top2 + module))

## 
## Call:
## lm(formula = gain ~ top2 + module, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.5511  -1.8178   0.1164   1.3166  15.2546 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.66736    0.55044  -4.846 1.51e-06 ***
## top2         0.07657    0.01632   4.691 3.18e-06 ***
## module       0.09533    0.05004   1.905   0.0571 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.265 on 821 degrees of freedom
## Multiple R-squared:  0.02924,    Adjusted R-squared:  0.02687 
## F-statistic: 12.36 on 2 and 821 DF,  p-value: 5.129e-06

Module and cohort dummies: positive, insignificant

summary.lm(lm(data = X, gain ~ cohort + module))

## 
## Call:
## lm(formula = gain ~ cohort + module, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.7354  -1.8352   0.1654   1.5123  13.3805 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.49813    0.29242   1.703   0.0889 .  
## cohortB     -1.29827    0.27537  -4.715 2.85e-06 ***
## cohortC     -1.36057    0.27386  -4.968 8.23e-07 ***
## module       0.07748    0.04972   1.558   0.1195    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.248 on 820 degrees of freedom
## Multiple R-squared:  0.04072,    Adjusted R-squared:  0.03721 
## F-statistic:  11.6 on 3 and 820 DF,  p-value: 1.878e-07

Module and team type (formed by students vs random) dummy: positive, insignificant

summary.lm(lm(data = X, gain ~ team_type + module))

## 
## Call:
## lm(formula = gain ~ team_type + module, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.1160  -1.9847   0.5216   1.1308  14.1482 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -0.50576    0.30851  -1.639    0.102
## team_typerandom  0.19781    0.25149   0.787    0.432
## module           0.08228    0.05061   1.626    0.104
## 
## Residual standard error: 3.307 on 821 degrees of freedom
## Multiple R-squared:  0.003962,   Adjusted R-squared:  0.001536 
## F-statistic: 1.633 on 2 and 821 DF,  p-value: 0.196

All together: positive, significant at 0.1 level.

summary.lm(lm(data = X, gain ~ top2 + team_type + cohort + module))

## 
## Call:
## lm(formula = gain ~ top2 + team_type + cohort + module, data = X)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.288  -1.672   0.085   1.502  14.373 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -1.29971    0.65647  -1.980   0.0481 *  
## top2             0.06798    0.01624   4.187 3.13e-05 ***
## team_typerandom -0.30703    0.26979  -1.138   0.2554    
## cohortB         -1.32325    0.29232  -4.527 6.88e-06 ***
## cohortC         -1.42776    0.29263  -4.879 1.28e-06 ***
## module           0.08912    0.04926   1.809   0.0708 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.212 on 818 degrees of freedom
## Multiple R-squared:  0.064,  Adjusted R-squared:  0.05828 
## F-statistic: 11.19 on 5 and 818 DF,  p-value: 1.904e-10

Module and team dummies:

summary.lm(lm(data = X, gain ~ team + module))

## 
## Call:
## lm(formula = gain ~ team + module, data = X)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.3065  -1.5013   0.1507   1.6302  12.4457 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.82972    1.09497  -1.671  0.09515 .  
## team2001 CA21   1.12911    1.51975   0.743  0.45775    
## team2001 UO     0.40229    1.64181   0.245  0.80651    
## team2001 YN2    2.93956    1.51975   1.934  0.05348 .  
## team2003 UM3    2.56456    1.51975   1.687  0.09195 .  
## team2004 ME6    1.15131    1.57312   0.732  0.46449    
## team2005 ED224  3.46772    1.51975   2.282  0.02280 *  
## team2005 NX55   0.90728    1.51975   0.597  0.55070    
## team2005 TM173  8.05633    1.51975   5.301 1.53e-07 ***
## team2006 CD    -0.21361    1.51975  -0.141  0.88826    
## team2006 GU2    0.53639    1.51975   0.353  0.72423    
## team2006 QV89  -2.24078    1.73309  -1.293  0.19645    
## team2006 SF281  2.97183    1.51975   1.955  0.05092 .  
## team2006 WP1    3.59683    1.51975   2.367  0.01821 *  
## team2007 FT3    1.42366    1.64159   0.867  0.38610    
## team2007 KO4   -1.87500    1.51975  -1.234  0.21770    
## team2007 SN6    3.59683    1.51975   2.367  0.01821 *  
## team2007 VD8   -0.81456    1.51975  -0.536  0.59214    
## team2007 YM     2.25822    1.51975   1.486  0.13774    
## team2008 EK68   2.15317    1.51975   1.417  0.15698    
## team2008 EM68   2.47183    1.51975   1.626  0.10429    
## team2008 JD33   4.00822    1.51975   2.637  0.00854 ** 
## team2008 SH148  1.75822    1.51975   1.157  0.24769    
## team2008 TE     2.37911    1.51975   1.565  0.11792    
## team2008 UM1    2.96772    1.51975   1.953  0.05124 .  
## team2008 US     2.22051    1.57318   1.411  0.15854    
## team2008 UV99   1.75686    2.05788   0.854  0.39354    
## team2008 UY91   2.28228    1.51975   1.502  0.13360    
## team2008 VB4   -1.28069    1.64168  -0.780  0.43558    
## team2008 VL     2.50000    1.51975   1.645  0.10041    
## team2008 VM     1.71772    1.51975   1.130  0.25874    
## team2008 VS4    2.00822    1.51975   1.321  0.18678    
## team2008 XK     1.53228    1.51975   1.008  0.31368    
## team2008 YD3    2.77817    1.51975   1.828  0.06796 .  
## team2009 FG     2.88322    1.51975   1.897  0.05821 .  
## team2009 FZ4   -1.08861    1.51975  -0.716  0.47403    
## team2009 JE1    2.50411    1.51975   1.648  0.09985 .  
## team2009 JF1   -1.28228    1.51975  -0.844  0.39910    
## team2009 TB     1.97168    1.57318   1.253  0.21050    
## team2009 VZ39   3.78639    1.51975   2.491  0.01295 *  
## team2010 AU118  2.40317    1.51975   1.581  0.11425    
## team2010 CA     0.50000    1.51975   0.329  0.74225    
## team2010 CA55   3.03639    1.51975   1.998  0.04610 *  
## team2010 DG77  -0.81044    1.51975  -0.533  0.59401    
## team2010 DJ77   0.80616    1.57318   0.512  0.60850    
## team2010 HP20  -2.39905    1.51975  -1.579  0.11487    
## team2010 JA43   0.93956    1.51975   0.618  0.53662    
## team2010 LJ68   1.20189    1.51975   0.791  0.42930    
## team2010 MY112  0.82278    1.51975   0.541  0.58841    
## team2010 UJ     1.16550    1.51975   0.767  0.44339    
## team2010 VO139  2.81867    1.51975   1.855  0.06405 .  
## team2010 VP139  4.02678    1.57326   2.560  0.01069 *  
## team2010 VR139  2.46361    1.51975   1.621  0.10544    
## team2010 WW8    1.78639    1.51975   1.175  0.24021    
## team2010 XB73   2.03639    1.51975   1.340  0.18069    
## team2010 XC     1.44778    1.51975   0.953  0.34109    
## team2010 XN69  -0.15728    1.51975  -0.103  0.91760    
## team2010 XP     1.56456    1.51975   1.029  0.30360    
## team2010 XQ     1.19778    1.51975   0.788  0.43087    
## team2011 CF66   2.75822    1.51975   1.815  0.06995 .  
## team2011 CW46   2.25822    1.51975   1.486  0.13774    
## team2011 DX4   -2.10095    1.51975  -1.382  0.16727    
## team2011 KF36   1.26233    1.51975   0.831  0.40647    
## team2012 BK14   1.94778    1.51975   1.282  0.20038    
## team2012 BL14   2.47595    1.51975   1.629  0.10372    
## team2012 BP123  3.62911    1.51975   2.388  0.01720 *  
## team2012 TC4   -1.58564    1.57326  -1.008  0.31386    
## team2013 YB     3.10095    1.51975   2.040  0.04167 *  
## team2014 HD198  1.19778    1.51975   0.788  0.43087    
## team2014 HJ198  1.31044    1.51975   0.862  0.38882    
## team2014 HN198  2.19778    1.51975   1.446  0.14857    
## team2014 HR197  5.00411    1.51975   3.293  0.00104 ** 
## team2014 JT79   3.43544    1.51975   2.261  0.02409 *  
## team2014 JV79   0.66550    1.51975   0.438  0.66159    
## team2014 MA68  -0.83861    1.51975  -0.552  0.58125    
## team2014 ML67   0.59061    1.64218   0.360  0.71921    
## team2014 MO68   1.04112    1.57310   0.662  0.50829    
## team2014 MV67   0.62089    1.51975   0.409  0.68299    
## team2014 NJ65   4.09683    1.51975   2.696  0.00719 ** 
## team2014 OY391  2.38322    1.51975   1.568  0.11728    
## team2015 BS516  2.75411    1.51975   1.812  0.07037 .  
## team2015 HM182  1.32278    1.51975   0.870  0.38438    
## team2015 HO182  0.37911    1.51975   0.249  0.80308    
## team2015 HQ182  1.85233    1.57326   1.177  0.23943    
## team2015 HS182  2.50411    1.51975   1.648  0.09985 .  
## team2015 HV182  1.91961    1.51975   1.263  0.20696    
## team2015 HW182  2.03228    1.51975   1.337  0.18157    
## team2015 MC131  2.22183    1.51975   1.462  0.14419    
## team2015 ME131 -1.38322    1.51975  -0.910  0.36304    
## team2015 YV20   1.69367    1.51975   1.114  0.26547    
## team2016 AZ193  0.93956    1.51975   0.618  0.53662    
## team2016 JG38   1.56867    1.51975   1.032  0.30233    
## team2016 JO38   3.13322    1.51975   2.062  0.03960 *  
## team2016 JP38   1.06456    1.51975   0.700  0.48385    
## team2016 JT38   3.34683    1.51975   2.202  0.02797 *  
## team2016 LP10  -1.12500    1.51975  -0.740  0.45939    
## team2016 NL56   1.72183    1.51975   1.133  0.25761    
## team2016 PO66   0.08450    1.51975   0.056  0.95568    
## team2016 PR66   2.25822    1.51975   1.486  0.13774    
## team2016 QY84   1.75411    1.51975   1.154  0.24880    
## team2016 RP41  -0.28654    1.57318  -0.182  0.85552    
## team2016 VA18   0.44519    1.64198   0.271  0.78637    
## team2016 WN55   1.68723    1.64168   1.028  0.30442    
## team2017 AA21   2.53228    1.51975   1.666  0.09610 .  
## team2017 AE21  -2.68600    1.73329  -1.550  0.12167    
## team2017 AR20  -0.34272    1.51975  -0.226  0.82165    
## team2017 AY20  -1.52817    1.51975  -1.006  0.31498    
## module          0.06498    0.04668   1.392  0.16436    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.04 on 716 degrees of freedom
## Multiple R-squared:  0.2664, Adjusted R-squared:  0.1568 
## F-statistic:  2.43 on 107 and 716 DF,  p-value: 6.809e-12

Conclusion: the effect of module is very small. Depending on the method of measurement, the gain grows on average by 0.06-0.09 points (with 8 questions, 5 points for each question) with every new module. It means that even if team synergy is getting better as time goes by, the process is extremely slow.

More specifically, here is the confidence interval for the most optimistic model

fit <- lm(data = X, gain ~ top2 + module)
confint(fit, "module")

##               2.5 %    97.5 %
## module -0.002891639 0.1935501

Taking the upper limit as the true value of the coefficient, we see that the total increase in the gain during the whole semester is

confint(fit, "module")[2] * 7

## [1] 1.35485

Thus an increase of 1.3 points (in a test with 8 questions, each worth 5 points) is the most optimistic estimate of the total increase in TRA results due to improving teamwork over the course of the whole semester. To put it into some context, it means that due to improvement in teamwork, a team that gets one difficult question in the third attempt in module 1 would get it in the second attempt in module 8. This is not a lot.

We have also tried modelling with symbolic regression but it did not change the main finding — increase in TRA performance over time is very slow.

Some thoughts: there is nothing wrong about it and it certainly does not mean that TBL is not working. If the main learning objective is getting domain knowledge rather than acquiring teamwork skills, then as long as we see that students are learning course material actively and are happy, slow progress in teamwork is not a major concern. But if teamwork is an official learning outcome, then pure TBL may not be sufficient. Some scaffolding is needed.

Variance across teams

This part is incomplete, i.e., just some thoughts about how this could be done and some exploratory analysis. First, we look at the distribution of gain in teams. We see that, indeed, teams have different median gain.

ggplot(data = X, 
       aes(x = team, y = gain, group = team)) +
  theme(axis.text.x = element_text(angle = 45, size = 5, hjust = 1)) +
  geom_boxplot()

Below we create a table of all the teams with the median gain in each team.

hru <- aggregate(wide_data$midterm_test, by = list(wide_data$fake_team),
                 FUN = median, na.rm = TRUE)
names(hru) <- c("team", "median_mt")

team_performance <- aggregate(X$gain, 
          by = list(X$team, X$cohort, X$team_type), FUN = median, na.rm = TRUE)

names(team_performance) <- c("team", "cohort", "formation", "median_gain")

team_performance <- merge(team_performance, hru, by = "team")
team_performance

The most successful teams are

team_performance[team_performance$median_gain > 2 , ]

The least successful teams are

team_performance[team_performance$median_gain < -2 , ]

Comparing team performance by cohort

ggplot(data = team_performance, aes(x = cohort, 
                          y = median_gain)) + geom_boxplot()

And by type of team formation

ggplot(data = team_performance, aes(x = formation, 
                          y = median_gain)) + geom_boxplot()

What is the relation of the midterm test score and team success?

ggplot(data = team_performance, aes(x = median_gain, 
                                    y = median_mt, 
                                    group = cohort, 
                                    colour = cohort)) +
  geom_point() + geom_smooth(method = "lm")

Now a statistical model

summary.lm(lm(data = team_performance,
              median_mt ~ cohort + median_gain))

## 
## Call:
## lm(formula = median_mt ~ cohort + median_gain, data = team_performance)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.4613  -5.3860  -0.1445   4.9716  26.1120 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  58.2023     1.3391  43.465  < 2e-16 ***
## cohortB     -20.0296     1.9155 -10.457  < 2e-16 ***
## cohortC      14.1513     1.9299   7.333 5.31e-11 ***
## median_gain   0.9059     0.4781   1.895   0.0609 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.822 on 103 degrees of freedom
## Multiple R-squared:  0.7699, Adjusted R-squared:  0.7632 
## F-statistic: 114.9 on 3 and 103 DF,  p-value: < 2.2e-16

We see that there is, indeed, very weak evidence supporting the claim. More detailed investigation is needed.

Process TBL scores