Introduction

“The town had a low wall of no great extent on one side, and to attack this the Romans employed three picked maniples. […] The men of the first held their shields over their heads, and closed up, so that, owing to the density of the bucklers, it became like a tiled roof […] in the shape of a tortoise (testudo)”. Polybius, The Histories – Book 28.11 [1]

The Roman Testudo is a well-known example of a military formation, where soldiers put together their shields to achieve a common goal, such as to protect themselves against a threat or to let other soldiers walk upon it whenever they come to a narrow ravine. Nonetheless, such powerful feature came at a price, since Roman Testudo were said to be advancing slowly in combat, since soldiers had to coordinate themselves. Accordingly, the Roman Testudo and its trade-off could be used as a metaphor for a situation, where students are expected to work together and solve a problem as a team.

There are still mixed evidences on whether working in teams is an appropriate method to prepare students for the challenges of a constantly changing business environment: on the one hand, some teachers prefer to give instruction via teacher-centered methods (lectures with little text reading and student discourse), under the belief that the best way to ensure content learning is for the instructor to present all necessary information to students (McKeachie and Svinicki, 2013). On the other hand, some scholars claim that traditional teaching methods do not enable all students to appropriately engage with the types of academic literacy constitutive to higher education (Hake, 1998; Lea and Street, 2006). Hence, this article starts with a simple intuition to bridge the two viewpoints: if we assume that the team itself is an important outcome of a team project, could we assess, at the end of the course, if the students would have been more/less effective without it? Indeed, there is a consensus on the difficulty of correctly assessing the performance of each student in a team project (Brazhkin & Zimmerman, 2019), and most educators lack a simple tool to do it. Nonetheless, most of the previous works have considered the team as noise to be cancelled to assess the individual, whereas we consider it as the most important artefact of a course, which asks students to work in teams to solve real-world projects and reflect on what they learned by doing so.

According to Kolb (2015) learning is the process whereby knowledge is created through the transformation of experience. Group-based learning is seen as a form of experiential learning and it has been termed differently through the years: (a) small group learning (Springer et al., 1999) include activities where the teacher lectures for 15–20 minutes and then asks students to pair with the student beside them to discuss a question, (b) collaborative learning involves carefully planned and structured group activities that are infused into a course of learning, whereas (c) Team-based learning (TBL) makes intense use of small groups in that it changes the structure of the course, in order to develop and then take advantage of the special capabilities of high-performance learning teams (Michaelsen et al., 2004). According to its authors, TBL is an important opportunity for teamwork skill development, experiential learning, and learning from peers. However, TBL presents many challenges and is most appropriate in courses that meet two conditions: (1) students are required during the course to understand a significant body of information and (2) a primary goal of the course is to apply this content by solving problems, answering complex questions and resolving issues (Swanson et al., 2019).

Accordingly, our research question is: “how can we design a summative assessment of individual and team performance in a team-based learning scenario?”

The rest of the paper proceeds as it follows. Section 2 briefly reviews the existing body of knowledge to answer our research question. Section 3 describes design science as our chosen methodology, highlights the relevant elements of the course which applies the Testudo method and then describes how to create and test the DuoTest prototype. Section 4 presents our preliminary findings, whereas section 5 concludes the paper by discussing the contribution and shortcomings of our work.

2 Literature review

In this section, we briefly assess the existing body of knowledge and define three constructs to avoid the jingle fallacy (constructs with the same name referring to different phenomena): (a) team health, which can be used to assess how well individuals work together in a team, (b) transactivity, to assess how each individual in a team can build on previous works from team members and (c) immediate feedback assessment technique, a tool used for summative evaluation in team-based learning that could be used to assess transactivity.
### 2.1 I2T: Individual contributions for the Team health

Recent work from (O’Neill et al., 2020) presents a set of 18 questions to rapidly and reliably assess the team health by asking team members to describe their perception of team communication, adaptability, relationships and education. Other scholars have suggested that assessment in TBL should take into account the cognitive, affective and behavioral dimensions (Brazhkin & Zimmerman, 2019). Indeed, students have multiple goals and motivations, which influence the team performance: mastery goals (“I want to learn new things”) and social responsibility goals (“I want help my peers”) prevail in effective teams, whereas belongingness goals (e.g., “I want my peers to like me”) were more important than mastery goals in ineffective teams (Hijzen et al., 2007).

2.2 T2I: Team effect on the Individual performance

To some degree, the group product will be codified in an artifact (e.g., group report, dialogue, diagram, etc.), but the individual experience of that collaborative learning event will be transposed to future collaborative learning events. (Strijbos, 2010). Accordingly, the team effect can be associated to transactivity, that is the extent to which students refer and build on each other’s’ contributions and it can be measured by reflected in collaborative dialogue or individual products, or the extent to which students transform a shared artifact (e.g., a group report) (Weinberger et al., 2007).

2.3 Gap in the literature: how to assess transactivity

The immediate feedback assessment technique (IF-AT) form has (a) a series of boxes covered by an opaque, waxy coating similar to that found on scratch-off lottery tickets corresponding to the alternatives, with only one correct alternative having with a small star in it (Maurer & Kropp, 2015). The athours found that students who did the final exam with the Immediate Feedback Assessment Technique (IF-AT) scored 10% more on average when they got partial credit for iterative responding (they could scratch more then one box). Although, this approach is already used in team-based learning scenarios (Mazur, 1999), there is not a simple way to use it and assess how team transactivity influence individual performance.

3 Chosen methodology to develop and test the artefact

We position our study in the field of design science research (Hevner et al., 2004) and we developed an artefact in the shape of a prototype (March & Smith, 1995), following the guidelines of Peffers et al. (2007).

(1) Identify problem and motivate. We describe an example of course of organization design, which would like to assess transactivity. At the beginning of the semester, students play a multi-round business simulation game (Martin-Rios & Erhardt, 2019). In this phase, students are assigned to a new random group every week, to learn how to rapidly work together and take decision under uncertainty. After four weeks, students form a group of max 5 team members. In this phase, students are assigned to a real project done with an external firm for eight weeks. All projects respect the five criteria for a project-based learning activity (Thomas, 2000): (a) projects are central to the curriculum, since the score given to the students reports will count as their midterm exam, (b) they are focused on problems that ‘drive’ students to encounter/struggle with the central concepts of a discipline, (c) they involve students in a constructive investigation, since students have to help the firm make sense of its data to find the solution, (d) they are student-driven to a significant degree, and (e) they are realistic and not school-like. Every week, students are asked to fill in a new section of the report and to submit it on a Moodle Workshop activity (Moodle, 2019a), where it will be assessed by their peers. During each class, the teacher briefly clarifies the required activities and facilitates discussions among team members. Slides are seldomly presented in class, since they are available to students in advance, together with check-up questions, as Moodle Lessons (Moodle, 2019b).

(2) Define objectives of the solution. We wanted to improve the immediate feedback assessment technique (IF-AT) by developing an online solution, which could allow students to do the final exame by themselves and then to get partial credits if they managed to correct their mistakes, by discussing with their team members. This way, we could measure the degree of transactivity in each team. Accordingly, we state three hypotheses, which we would like to test: • H1: the individual performance of Exa01 has a positive and statistically significant effect over the individual performance of Exa02. This statement is supported by all the reviewed literature on team-based learning • H2: the team performance (transactivity) has a statistically significant effect over the individual performance of Exa02. If this hypothesis is correct, we should be able to see different improvement in different teams, depending on their degree of transactivity • H3: the team performance has positive and statistically significant effect over the indivdual performance of Exa02. H3 extends H2. Based on previous results from (Maurer & Kropp, 2015) on IF-AT with partial credit, we could assume that a student having the possibility to correct his mistakes by discussing with his team will improve his final score.

(3) Design and development of the artefact: the DuoTest prototype. The underlying idea of DuoTest is simple: to allow students to do their final exams twice in a row: the first time, participants do their exam individually (Exa01); the second time, they solve the same exam in groups (Exa02). By comparing individual and team performances, the system induces the positive (or negative) effect of each group over the individual performances.

(4) Demonstration. Before the exam, we create a Moodle Quiz activity (Moodle, 2019) with ten questions: five theoretical questions and five questions about a case study. The type of the ten questions is Short Answer (Moodle, 2020): this will be relevant when we explain how to analyze the data after the exam. In the parameters of the Moodle Quiz activity, hereinafter referred to as Exa01, we set the duration at 35 minutes. Then, we copy the Quiz activity a second time, hereinafter referred to as Exa02. This way, the questions of Exa02 are the same of Exa01. In the parameters of Exa02, we set the beginning of the activity 5 minutes after the end of Exa01, to allow students the logistical time to setup their teams in the class. The duration of Exa02 is set at 20 minutes, which brings the total to 60 minutes. Finally, in the Moodle Gradebook (Moodle, 2019), we set the score of the final exam as the average between Exa01 and Exa02. During the exam, students are expected to do Exa01 without additional material and by themselves. When Exa01 is over after 35 minutes, each student assembles with the team members, with whom he has been working between week 5 and 12. Students can talk among them during Exa02 and they have access of any type of material. Indeed, Exa02 recreates the conditions that the team has lived during the semester and allows educators to assess in detail the dynamics of each team. After the test, each answer is corrected by using a special feature of Short-answer questions: the educator defines a set of rules in the parameters of each question, and the answers of all students are corrected automatically by Moodle. This assures a coherent assessment all along and it increases the rigor of the overall process.

(5) Evaluation. We tested our prototype with three classes of undergraduate students undertaking the same course, for a total of 71 students attending the final exam in Sierre (Switzerland) the 20th of January 2020. We claim that the exam was (a) valid, since chosen questions provide useful information about the concepts seen in class, (b) reliable, thanks to the rule-driven correction of each question, and (c) recognizable, since it fully replicated the way students work during the semester.

Prelminary data analysis

This section analyses the results of the individual and the group exams, which are shown in Figure 4.1.

The scores dataset contains 66 rows. An example of the first three lines is shown below.

UID Class Group Attendance..20.. Midterm.Exam..30.. Exam.01..25.. Exam.02..25..
23 2 8 5.5 5.68 10.00 8.33
37 2 16 5.5 4.45 8.79 10.00
25 1 14 5.4 5.88 10.00 9.79
8 2 8 5.2 5.68 6.40 7.98
44 2 8 5.4 5.06 9.05 8.98

The first column shows the anonymized unique identifier (UID) of each student, associated to a group numer and the class. The other columns show the attendance score, the score of the MidTerm evaluation, the result of the part of the final exam done individually (Exa01)) and the part of the exam done in group (Exa02).
The graphic representation of the results is shown in Figure 4.1. Each student in a team correspond to a dot with a color; students are grouped by teams in the horizontal axis. This way, it is possible to see the difference between the result of the first exam (graph 4.1a) and the second Exam (graph 4.1b) of a a specific student in a specific group.

One could expect the results of the second exam to be better than the first one for three main reasons: (a) the second exam is open book, (b) the students can talk to each other, (c) the students can check their material in groups and be more effective.
A confirmation of this intuation come from the group G02, which had a strong concentration of scores below 6/10 in the first exam and then shifted up above 8/10 in the second exam.
Also, some team performed better than other, with team G15 bringing all team members up to 10/10 and group G07 bringing a dispersed set of points in the first exam up above 9/10 in the second exam.
Nonetheless, some teams performed worse in the second exam, the groups G12 and G13 being the most evident example of individuals, who decided to change some correct answers into wrong answers after discussing with the rest of the team. Finally, Group G04 had a student who attended the exam, but did not do it (row 18 in the table of Appendix A).

To assign some quantitative data to our assessment, with start by scaling the raw data presented in Appendix A, in order to properly compare the coefficients of each variables.

Moreover, after having looked for outliers with a large residual, we identify and remove the outlier in the row 43 (a student who attended the exam, but did not do it).

Table 4.1 illustrates that the performance of the first exam (Exa01) positively effects the score of the second exam (Exa02), with a coefficient of 0.20 (hence Exa02 = 0.20*Exa01). The value of p = 0.09 shows that the relationship between the two variables is statistically significant. Therefore, we confirm the hypothesis H1, and affirms that there is a causal effect between the first exam (done individually) and the second exam (done in group). Nonetheless, the Adjusted R2 = 0.03 suggests that the explanatory power of this model is fairly low. Moreover, the value of the intercept seems to not confirm that the shift between close book and open book did not lead to a positive effect across all students.

Assessing team performativity with DuoTest

We assign a binary variables for the students groups, that is the group 01 will have 1 in the column called G01. Since the group G09 seems to have the worst performance, it will have 0 for each group variable and it will be treated as baseline. Such baseline allows us to add only 15 variables for the 16 groups.

Variable Model.01 Model.02 Effect
Intercept 0.00 ( 1.00 ) -2.05 ( 0.000 ) ***
Scale (Exa01) 0.22 ( 0.08 ) 0.06 ( 0.418 )
G01 0.46 ( 0.243 ) - -
G02 1.69 ( 0 ) *** -
G03 2.63 ( 0 ) *** + +
G04 1.89 ( 0 ) *** -
G05 2.48 ( 0 ) *** +
G06 2.61 ( 0 ) *** + +
G07 2.24 ( 0 ) *** +
G08 2.09 ( 0 ) *** +
G10 1.57 ( 0 ) *** -
G11 2.33 ( 0 ) *** +
G12 2.47 ( 0 ) *** +
G13 1.11 ( 0.006 ) *** - -
G14 2.66 ( 0 ) *** + +
G15 3.25 ( 0 ) *** + +
G16 2.97 ( 0 ) *** + +
Adj R2 0.03 0.71

The first thing worth noticing is the intercept, which is now statistically significant. The value of the intercept is negative, meaning that the second exam in itself was harder than the first exercise: hence, to align the visions of all team members to find the right answer might be harder than exected and that would explain why some students who got 10 in the first exam got a lower grade in the second one.

To assess the performance of each team, we look at the coefficients of each group, which mitigates the negative effect of the intercept. For example, the coefficient of the group 8 is 2.09, which compensate for the value of the intercept (-2.05). Thus, on average the team members of the Group 08 had slightly higher scores in the second exam.

The Adjusted R2 of the new second model (0.71) is very good and the coefficient of the first Exam (0.06) if not statistically significant (p = 0.420).

This validates the hypothesis H2, which states that the team effect increases the explanatory power of our model.

Indeed, one could assume that the increase in the value of the R2 would be the consequence of using more variables; but the Adjusted R2 automatically adjusts the R2 of the model to take this effect into account. Moreover, the regression diagnostics in Appendix B does not indicate any further issues. Nonetheless, the analysis of the coefficients shows that we cannot confirm nor reject hypothesis H3, which state that the team has a positive effect on the individual performance.
The quantitative analysis rejoins the insights already visible from Figure 4.1: the coefficient of some groups (e.g. G03, G06 and G14) is greater than the one of the Group 01, whereas some other groups have experience a (e.g. G09, G13 and NA).

Exploring the value of formative assessment with DeTotus

A model that uses only Exa01 and Group appears to be able to describe the team performance, but it is not really able to predict it in reality: the value of Exa01 can be obtained only few minutes before students starts working on Exa02 and the coefficients of each team will be obtained only once Exa02 is over.

We could try to increase the predictive power of linear model by using attendances and midterm to predict Exa02.

## 
## Call:
## lm(formula = scale(Exa02) ~ scale(Exa01) + Group + Attendance + 
##     MidTerm, data = new_scores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.57123 -0.24546  0.03893  0.25862  0.87007 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -6.24413    2.81362  -2.219 0.031441 *  
## scale(Exa01)  0.06277    0.07841   0.801 0.427504    
## Group1        0.20715    0.42397   0.489 0.627452    
## Group2        1.77367    0.46361   3.826 0.000391 ***
## Group3        2.25689    0.51019   4.424 5.90e-05 ***
## Group4        1.78432    0.48185   3.703 0.000569 ***
## Group5        2.19043    0.44697   4.901 1.23e-05 ***
## Group6        2.38649    0.45814   5.209 4.34e-06 ***
## Group7        2.04633    0.40926   5.000 8.79e-06 ***
## Group8        1.98021    0.37802   5.238 3.93e-06 ***
## Group10       1.17788    0.46690   2.523 0.015167 *  
## Group11       1.94044    0.50847   3.816 0.000403 ***
## Group12       2.43232    0.47255   5.147 5.35e-06 ***
## Group13       0.75340    0.45939   1.640 0.107827    
## Group14       2.49948    0.37838   6.606 3.56e-08 ***
## Group15       3.04799    0.40923   7.448 1.95e-09 ***
## Group16       2.87989    0.40674   7.080 6.90e-09 ***
## Attendance    0.70190    0.54015   1.299 0.200262    
## MidTerm       0.10239    0.21253   0.482 0.632262    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5384 on 46 degrees of freedom
## Multiple R-squared:  0.7916, Adjusted R-squared:  0.7101 
## F-statistic: 9.709 on 18 and 46 DF,  p-value: 2.87e-10

Unfortunately, the AdjR2 does not change (0.71) and the 2 variables are not statistically significant. This is due to the fact that the relationship between Miterm, Attendance and Exa02 is not linear.

## 
## Call:
## lm(formula = scale(Exa02) ~ scale(Exa01) + Group + A5a + A9a, 
##     data = new_scores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62644 -0.18375  0.04423  0.22671  0.95267 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.29088    0.38557  -5.942 3.30e-07 ***
## scale(Exa01)  0.06582    0.07791   0.845  0.40251    
## Group1        0.57984    0.41239   1.406  0.16628    
## Group2        1.84209    0.42053   4.380 6.59e-05 ***
## Group3        2.20434    0.63728   3.459  0.00116 ** 
## Group4        1.58740    0.53822   2.949  0.00495 ** 
## Group5        2.05296    0.60874   3.372  0.00150 ** 
## Group6        2.38849    0.49772   4.799 1.66e-05 ***
## Group7        2.35877    0.41210   5.724 7.03e-07 ***
## Group8        1.97144    0.39401   5.004 8.31e-06 ***
## Group10       1.20853    0.56282   2.147  0.03696 *  
## Group11       1.90560    0.60177   3.167  0.00271 ** 
## Group12       2.58745    0.45148   5.731 6.86e-07 ***
## Group13       0.98905    0.41296   2.395  0.02066 *  
## Group14       2.60099    0.35872   7.251 3.41e-09 ***
## Group15       3.00115    0.45992   6.525 4.30e-08 ***
## Group16       2.67771    0.50749   5.276 3.29e-06 ***
## A5a           0.06064    0.06747   0.899  0.37335    
## A9a                NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5412 on 47 degrees of freedom
## Multiple R-squared:  0.7849, Adjusted R-squared:  0.7071 
## F-statistic: 10.09 on 17 and 47 DF,  p-value: 1.708e-10
## 
## Call:
## lm(formula = scale(Exa02) ~ Group + A5a * A9a, data = new_scores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.61691 -0.20559  0.04283  0.23986  0.95302 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.40061    0.38662  -6.209 1.30e-07 ***
## Group1        0.09637    0.68060   0.142 0.888002    
## Group2        2.27962    0.63320   3.600 0.000763 ***
## Group3        4.09323    1.90517   2.148 0.036859 *  
## Group4        2.34701    0.87983   2.668 0.010448 *  
## Group5      -20.31173   22.46737  -0.904 0.370577    
## Group6        4.65253    2.21676   2.099 0.041235 *  
## Group7        1.93178    0.63487   3.043 0.003829 ** 
## Group8        0.95811    1.14476   0.837 0.406856    
## Group10       0.59989    0.87748   0.684 0.497552    
## Group11       4.72829    2.82033   1.677 0.100280    
## Group12       2.72317    0.43476   6.264 1.07e-07 ***
## Group13     -11.13184   12.22713  -0.910 0.367246    
## Group14       3.05727    0.54881   5.571 1.19e-06 ***
## Group15       3.00587    0.45575   6.595 3.36e-08 ***
## Group16       2.67749    0.50371   5.316 2.87e-06 ***
## A5a           2.10068    2.04885   1.025 0.310472    
## A9a                NA         NA      NA       NA    
## A5a:A9a      -0.02894    0.02902  -0.997 0.323788    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5396 on 47 degrees of freedom
## Multiple R-squared:  0.7862, Adjusted R-squared:  0.7088 
## F-statistic: 10.16 on 17 and 47 DF,  p-value: 1.505e-10
## 
## Call:
## lm(formula = scale(Exa02) ~ A5a * A9a, data = new_scores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.22016 -0.45619 -0.03171  0.78090  1.75690 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -2.652103   1.023353  -2.592   0.0119 *
## A5a          0.278887   0.117116   2.381   0.0204 *
## A9a          0.029559   0.015077   1.961   0.0545 .
## A5a:A9a     -0.002543   0.001723  -1.476   0.1452  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9132 on 61 degrees of freedom
## Multiple R-squared:  0.2051, Adjusted R-squared:  0.1661 
## F-statistic: 5.248 on 3 and 61 DF,  p-value: 0.00275
## 
## Call:
## lm(formula = scale(Exa02) ~ A5a * A9a, data = new_scores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.22016 -0.45619 -0.03171  0.78090  1.75690 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -2.652103   1.023353  -2.592   0.0119 *
## A5a          0.278887   0.117116   2.381   0.0204 *
## A9a          0.029559   0.015077   1.961   0.0545 .
## A5a:A9a     -0.002543   0.001723  -1.476   0.1452  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9132 on 61 degrees of freedom
## Multiple R-squared:  0.2051, Adjusted R-squared:  0.1661 
## F-statistic: 5.248 on 3 and 61 DF,  p-value: 0.00275

Discussion and conclusions

This article started by using the metaphor of the Roman Testudo to describe how students learn to cooperate in order to deal with problems in their future careers. Our study suggests that what seems to be a single phenomenon (team performance) is in reality composed of assorted heterogeneous elements (Davis, 1971): team health, which depends on each team member, and transactivity, which influences the future performance of each team member and that we called “the omitted variable” in the title of the article. Accordingly, we wanted to look for new ways to design a final exam to assess individual and team performance in a team-based learning (TBL) course. Such objective is relevant and persisting in the field of study of information systems, since TBL is increasingly used to teach university students how to work together and solve complex problems in a growing number of fields, and we were missing of a structured and simple way to perform summative assessment. Although our approach might be biased towards TBL as a form of teaching, our intent is to bridge forms of experiential learning with classic testing techniques such as written exams. We have selected and reviewed previous works from the fields of team-based learning, project-based learning and software solution to assess students. Although such works are complementary, a paper that combines these three views to develop an artefact is missing. Therefore, we have decided to create a theory of design and action (Gregor, 2006), which explains how to do something and gives explicit prescriptions for teachers to construct a new type of final test for TBL classes, which we called DuoTest. Our preliminary findings show promising results that needs to be replicated in other classes and other topics. So far, DuoTest extends existing solutions for immediate impact assessments (Maurer & Kropp, 2015), since it allows to obtain deeper insights on the effect of the team on the individual performance and on the effect of such individuals on the team, at a fraction of its cost. Nonetheless, future work should try to categorize the different types of transactivity performance, and to explain how to predict the coefficients of each team by using data collected during the semester to link together team health and transactivity.

Appendix

The complete dataset, before normalization and removal of row 43

UID Class Group A1a A1b A1 A2a A2b A2 A3a A3b A3 A4a A4b A4 A5a A5b A5 A6a A6b A6 A7a A7b A7 A8a A8b A8 A9a A9b A9 Attendance MidTerm Exa01 Exa02 Total.du.cours..Brut.
1 1 2 0 20.00 1.2 60 18.02 4.7 5 20.00 5.6 2 20.00 3.9 1 20.00 6.0 73 0.00 4.4 61 17.64 78.64 5 20.00 86.00 80 8 92.81 5.2 4.04 5.65 8.25 4.5
2 1 6 72 20.00 5.5 61 18.66 4.8 3 18.30 4.7 12 19.34 5.8 8 20.00 5.5 72 20.00 5.5 0 17.64 17.64 8 19.62 92.62 80 12 94.55 5.5 6.00 7.52 9.82 5.6
3 1 9 80 17.33 5.8 65 20.00 5.1 3 19.43 4.8 9 20.00 5.6 4 17.73 4.5 70 18.16 5.3 55 0.00 55.00 5 0.00 66.00 70 25 90.00 5.2 5.27 6.00 5.79 4.6
5 2 16 77 17.97 5.7 80 20.00 6.0 8 20.00 6.0 7 20.00 5.4 11 17.50 5.9 75 19.00 5.6 72 18.76 90.76 3 20.00 84.00 70 21 86.92 5.7 5.27 6.34 9.64 5.2
6 3 10 80 17.11 5.8 80 19.34 6.0 6 17.69 5.6 7 20.00 5.4 10 20.00 5.8 74 14.48 5.3 74 18.38 92.38 2 16.92 79.92 68 19 84.17 5.6 5.68 6.95 8.29 5.2
8 2 8 80 18.55 5.9 60 17.77 4.7 2 19.00 4.4 4 19.15 4.7 6 20.00 5.4 80 0.00 4.8 0 0.00 0.00 10 18.22 94.22 64 25 84.00 5.2 5.68 6.40 7.98 5.0
9 2 16 77 18.13 5.7 60 16.65 4.6 2 18.52 4.4 10 19.75 5.7 2 0.00 0.0 80 17.36 5.8 75 0.00 75.00 6 0.00 67.00 70 25 90.00 5.1 4.24 8.09 9.09 4.9
10 3 1 80 17.78 5.9 80 20.00 6.0 8 18.38 5.9 13 19.34 6.0 2 18.33 1.1 77 18.11 5.7 78 20.00 98.00 13 18.06 98.06 60 25 80.00 5.9 5.27 8.69 6.54 5.2
12 2 4 70 17.32 5.2 80 17.60 5.9 4 17.51 5.3 1 20.00 6.0 9 19.17 5.6 80 17.44 5.8 80 0.00 80.00 9 0.00 75.00 73 17 88.65 5.6 4.45 8.43 9.04 5.2
13 3 5 65 17.10 4.9 80 18.35 5.9 7 17.00 5.6 8 20.00 5.5 11 16.77 5.8 80 20.00 6.0 0 0.00 0.00 11 20.00 97.00 0 25 20.00 5.7 4.86 4.50 9.24 4.8
14 1 14 80 20.00 6.0 70 20.00 5.4 3 18.57 4.7 5 16.76 4.7 5 18.26 4.9 70 20.00 5.4 78 18.41 96.41 4 20.00 85.00 73 23 92.00 5.4 5.68 7.72 9.97 5.5
15 1 9 80 14.94 5.7 70 20.00 5.4 3 18.57 4.7 9 19.34 5.6 4 18.75 4.6 70 20.00 5.4 55 18.17 73.17 5 20.00 86.00 70 6 82.46 5.2 5.27 5.90 6.39 4.7
16 2 7 77 18.59 5.7 80 18.81 5.9 6 19.38 5.7 3 19.34 4.5 2 18.52 1.1 78 16.23 5.7 80 17.63 97.63 12 18.93 97.93 61 9 74.04 5.6 5.27 7.57 9.94 5.4
17 1 14 80 18.26 5.9 65 20.00 5.1 4 20.00 5.4 5 20.00 4.9 5 18.84 4.9 70 0.00 4.2 78 17.72 95.72 4 20.00 85.00 73 5 85.21 5.3 5.68 6.92 9.47 5.3
18 3 11 80 20.00 6.0 80 20.00 6.0 8 18.57 5.9 13 16.69 5.8 11 19.44 6.0 70 17.65 5.3 78 0.00 78.00 1 18.58 92.88 79 1 98.75 5.9 5.36 7.74 9.14 5.4
20 2 12 77 16.48 5.6 60 16.69 4.6 4 18.64 5.3 10 19.34 5.7 2 18.47 1.1 80 18.29 5.9 75 0.00 75.00 6 20.00 87.00 70 20 86.48 5.3 4.05 10.00 9.34 5.2
21 2 8 71 20.00 5.5 60 20.00 4.8 6 20.00 5.7 4 18.52 4.7 6 20.00 5.4 80 17.38 5.8 0 20.00 20.00 10 20.00 96.00 64 25 84.00 5.4 5.27 6.08 9.48 5.1
22 1 2 0 17.69 1.1 70 19.15 5.3 3 19.00 4.7 2 18.84 3.8 1 17.96 5.4 73 20.00 5.6 61 18.04 79.04 5 19.23 85.23 80 25 100.00 5.3 4.04 5.25 8.25 4.5
23 2 8 71 20.00 5.5 80 18.68 5.9 8 20.00 6.0 4 19.34 4.7 6 20.00 5.4 80 20.00 6.0 0 15.59 15.59 10 0.00 76.00 64 24 83.61 5.5 5.68 10.00 8.33 5.6
25 1 14 80 17.50 5.9 70 20.00 5.4 4 17.69 5.3 5 18.14 4.8 5 20.00 5.0 70 19.30 5.4 78 0.00 78.00 4 18.04 83.04 73 25 93.00 5.4 5.88 10.00 9.79 5.8
26 3 5 65 17.34 4.9 80 15.09 5.7 8 17.94 5.9 8 18.68 5.4 11 18.84 5.9 80 18.47 5.9 0 18.97 18.97 11 19.62 96.62 0 18 15.90 5.7 5.47 8.54 8.39 5.4
27 2 7 80 18.28 5.9 73 17.69 5.4 6 20.00 5.7 3 17.11 4.4 2 18.27 1.1 78 16.81 5.7 80 17.62 97.62 12 20.00 99.00 61 25 81.00 5.6 5.27 7.82 8.19 5.2
28 2 7 70 19.17 5.4 60 16.30 4.6 8 17.41 5.8 3 20.00 4.6 2 20.00 1.2 78 20.00 5.9 80 20.00 100.00 12 20.00 99.00 61 16 76.47 5.5 4.86 8.12 8.84 5.2
30 3 1 80 18.26 5.9 80 19.06 5.9 8 17.97 5.9 13 20.00 6.0 2 20.00 1.2 77 19.69 5.8 78 0.00 78.00 13 20.00 100.00 60 25 80.00 5.7 5.68 7.64 7.39 5.2
31 3 10 80 20.00 6.0 80 20.00 6.0 6 20.00 5.7 7 20.00 5.4 10 20.00 5.8 74 17.31 5.5 74 20.00 94.00 2 16.56 79.56 68 25 88.00 5.7 6.00 6.54 8.44 5.3
32 3 5 70 17.21 5.2 80 17.78 5.9 7 20.00 5.8 8 17.86 5.4 11 20.00 6.0 80 17.85 5.9 0 20.00 20.00 11 20.00 97.00 0 25 20.00 5.7 4.95 5.18 9.59 5.0
33 3 11 80 20.00 6.0 80 20.00 6.0 7 20.00 5.8 13 20.00 6.0 11 18.11 5.9 70 18.88 5.3 78 18.27 96.27 1 18.40 92.02 79 1 98.75 5.9 4.86 8.09 8.99 5.3
34 1 14 0 16.47 1.0 70 20.00 5.4 3 20.00 4.8 5 20.00 4.9 5 17.22 4.8 70 18.40 5.3 78 17.72 95.72 4 17.51 82.51 73 1 91.25 5.3 6.00 5.29 8.47 5.1
35 2 7 70 20.00 5.4 60 16.71 4.6 2 19.42 4.5 3 18.43 4.5 2 17.11 1.0 78 0.00 4.7 80 20.00 100.00 12 18.93 97.93 61 25 81.00 5.2 5.88 6.74 8.34 5.2
36 3 15 80 18.03 5.9 80 18.52 5.9 4 19.65 5.4 12 17.37 5.7 8 18.64 5.4 0 0.00 0.0 74 0.00 74.00 7 0.00 68.00 70 1 87.50 5.3 4.04 6.55 10.00 4.8
37 2 16 70 17.01 5.2 73 17.35 5.4 6 19.42 5.7 7 20.00 5.4 11 17.19 5.8 75 20.00 5.7 72 18.09 90.09 3 18.46 82.46 70 25 90.00 5.5 4.45 8.79 10.00 5.3
38 3 15 70 17.48 5.2 80 19.14 5.9 6 20.00 5.7 12 17.14 5.7 8 20.00 5.5 0 20.00 1.2 74 20.00 94.00 7 20.00 88.00 70 1 87.50 5.5 5.27 10.00 10.00 5.7
39 3 13 70 20.00 5.4 80 20.00 6.0 4 20.00 5.4 10 18.43 5.6 6 20.00 5.4 71 16.22 5.2 78 18.16 96.16 13 0.00 80.00 0 1 0.00 5.6 5.68 6.89 7.47 5.1
40 3 3 80 20.00 6.0 80 20.00 6.0 8 20.00 6.0 6 20.00 5.3 11 18.26 5.9 79 20.00 5.9 79 19.75 98.75 13 20.00 100.00 76 25 96.00 6.0 5.06 7.77 9.52 5.4
41 3 11 70 18.84 5.3 80 17.67 5.9 8 17.80 5.9 13 20.00 6.0 11 20.00 6.0 70 18.85 5.3 78 0.00 78.00 1 18.46 92.31 79 22 96.59 5.8 4.04 6.99 8.69 4.8
42 1 14 80 17.15 5.8 65 17.88 5.0 5 17.97 5.5 5 18.68 4.8 5 0.00 3.8 70 17.11 5.2 78 19.03 97.03 4 17.18 82.18 73 25 93.00 5.4 5.68 7.34 9.57 5.4
43 1 14 80 19.07 5.9 70 17.23 5.2 3 20.00 4.8 5 20.00 4.9 5 18.64 4.9 70 20.00 5.4 78 18.73 96.73 4 0.00 65.00 73 23 92.00 5.4 5.68 4.50 8.47 4.9
44 2 8 70 19.78 5.4 60 20.00 4.8 8 18.50 5.9 4 20.00 4.7 6 20.00 5.4 80 18.06 5.9 0 13.58 13.58 10 20.00 96.00 64 1 80.00 5.4 5.06 9.05 8.98 5.3
45 3 15 80 18.00 5.9 80 17.62 5.9 8 18.70 5.9 12 20.00 5.9 8 18.26 5.4 0 20.00 1.2 74 19.05 93.05 7 0.00 68.00 70 1 87.50 5.6 5.06 6.00 10.00 5.1
46 3 1 65 17.40 4.9 80 20.00 6.0 7 18.52 5.7 13 19.34 6.0 2 18.75 1.1 77 17.56 5.7 78 0.00 78.00 13 0.00 80.00 60 13 74.57 5.3 4.95 7.45 6.54 4.8
47 1 9 0 16.19 1.0 70 19.75 5.4 5 19.42 5.5 9 16.76 5.4 4 20.00 4.7 70 18.35 5.3 55 20.00 75.00 5 18.46 84.46 70 21 86.92 5.2 5.36 6.14 6.19 4.7
48 1 9 80 16.29 5.8 70 16.41 5.2 3 20.00 4.8 9 20.00 5.6 4 16.48 4.5 70 20.00 5.4 55 20.00 75.00 5 18.06 84.06 70 24 89.61 5.3 4.65 6.64 6.19 4.6
49 3 13 80 20.00 6.0 80 20.00 6.0 8 19.00 5.9 10 0.00 4.5 6 19.31 5.4 71 19.23 5.4 78 20.00 98.00 13 18.52 98.52 0 1 0.00 5.8 5.27 7.44 8.52 5.2
50 3 13 80 20.00 6.0 80 18.31 5.9 6 16.64 5.5 10 20.00 5.7 6 19.58 5.4 71 17.31 5.3 78 0.00 78.00 13 17.30 97.30 0 1 0.00 5.7 4.86 7.69 5.62 4.7
52 2 16 80 0.00 4.8 73 19.09 5.5 6 19.42 5.7 7 20.00 5.4 11 0.00 4.8 75 19.06 5.6 72 0.00 72.00 3 0.00 64.00 70 1 87.50 5.3 5.27 7.39 10.00 5.3
53 2 4 71 0.00 4.3 60 17.23 4.6 6 17.22 5.5 1 17.02 5.1 9 20.00 5.6 80 20.00 6.0 80 18.40 98.40 9 17.33 92.33 73 3 83.84 5.5 4.04 4.94 8.64 4.5
56 3 15 80 20.00 6.0 80 17.35 5.8 6 20.00 5.7 12 20.00 5.9 8 20.00 5.5 0 17.15 1.0 74 0.00 74.00 7 20.00 88.00 70 1 87.50 5.6 5.27 8.18 10.00 5.5
57 3 1 65 20.00 5.1 80 17.57 5.9 8 17.97 5.9 13 16.76 5.8 2 0.00 0.0 77 19.30 5.8 78 0.00 78.00 13 0.00 80.00 60 1 75.00 5.3 5.68 6.59 6.54 4.9
58 2 4 70 20.00 5.4 60 18.52 4.7 2 20.00 4.5 1 20.00 6.0 9 20.00 5.6 80 17.02 5.8 80 0.00 80.00 9 0.00 75.00 73 25 93.00 5.4 4.45 8.44 7.54 4.9
59 1 6 72 18.26 5.4 60 20.00 4.8 3 17.41 4.6 12 20.00 5.9 7 0.00 4.3 72 20.00 5.5 0 20.00 20.00 8 18.46 91.46 80 25 100.00 5.4 5.68 7.39 8.57 5.3
60 1 2 80 17.69 5.9 61 18.81 4.8 4 18.67 5.3 2 19.01 3.8 1 20.00 6.0 73 17.25 5.4 61 19.51 80.51 5 0.00 66.00 80 23 99.00 5.4 4.24 5.30 8.55 4.6
62 3 13 80 20.00 6.0 80 19.72 6.0 8 20.00 6.0 10 0.00 4.5 6 18.75 5.3 71 16.06 5.2 78 15.99 93.99 13 20.00 100.00 0 25 20.00 5.7 5.77 7.92 8.42 5.4
63 3 10 80 20.00 6.0 80 18.31 5.9 8 19.42 6.0 7 20.00 5.4 10 20.00 5.8 74 20.00 5.6 74 20.00 94.00 2 20.00 83.00 68 25 88.00 5.8 5.68 6.55 7.10 5.1
64 3 3 80 20.00 6.0 80 20.00 6.0 4 18.52 5.3 6 20.00 5.3 11 18.75 5.9 79 18.44 5.8 79 0.00 79.00 13 20.00 100.00 76 15 91.23 5.8 5.77 5.97 9.02 5.2
67 1 6 72 17.71 5.4 61 16.19 4.6 3 19.32 4.8 12 20.00 5.9 8 0.00 4.3 72 19.25 5.5 0 20.00 20.00 8 20.00 93.00 80 25 100.00 5.5 5.47 9.99 9.52 5.7
68 3 3 70 18.70 5.3 80 17.44 5.8 4 17.92 5.3 6 18.68 5.2 11 20.00 6.0 79 0.00 4.7 79 0.00 79.00 13 17.69 97.69 76 4 87.99 5.5 4.45 10.00 9.37 5.3
69 3 10 80 18.02 5.9 80 16.83 5.8 6 18.26 5.6 7 18.52 5.3 10 20.00 5.8 74 20.00 5.6 74 20.00 94.00 2 17.31 80.31 68 11 81.30 5.7 5.68 9.58 8.34 5.6
70 2 12 71 20.00 5.5 73 17.60 5.4 4 20.00 5.4 10 18.02 5.6 2 20.00 1.2 80 17.20 5.8 75 20.00 95.00 6 20.00 87.00 70 25 90.00 5.5 4.45 7.44 8.09 4.9
71 3 5 80 17.34 5.8 80 18.02 5.9 8 20.00 6.0 11 19.15 5.8 11 18.75 5.9 80 0.00 4.8 0 19.75 19.75 11 0.00 77.00 0 10 13.28 5.5 4.86 7.97 8.99 5.2
72 3 15 80 20.00 6.0 80 18.24 5.9 7 20.00 5.8 12 18.68 5.8 8 20.00 5.5 0 15.76 0.9 74 20.00 94.00 7 0.00 68.00 70 1 87.50 5.7 4.95 6.89 10.00 5.2
73 3 11 80 15.95 5.8 80 18.92 5.9 8 20.00 6.0 13 20.00 6.0 11 19.17 6.0 70 18.08 5.3 78 0.00 78.00 1 0.00 0.00 79 14 94.11 5.8 4.86 6.60 9.14 5.1
75 2 8 80 18.73 5.9 60 20.00 4.8 1 18.13 5.4 4 18.02 4.6 6 20.00 5.4 80 17.49 5.8 0 0.00 0.00 10 0.00 76.00 64 1 80.00 5.2 5.68 5.50 8.48 5.0
76 1 2 80 15.68 5.7 61 19.01 4.8 5 19.00 5.5 2 18.68 3.8 3 19.32 4.5 73 19.45 5.5 61 17.83 78.83 5 20.00 86.00 80 2 90.00 5.2 3.55 5.65 7.25 4.2
77 2 12 70 17.97 5.3 60 20.00 4.8 6 20.00 5.7 10 20.00 5.7 2 20.00 1.2 80 16.79 5.8 75 16.65 91.65 6 18.85 85.85 70 7 82.57 5.4 4.45 7.95 10.00 5.2
79 3 11 80 15.59 5.7 80 17.62 5.9 6 16.82 5.5 13 19.34 6.0 11 18.47 5.9 70 19.04 5.3 78 17.76 95.76 1 17.95 89.74 79 1 98.75 5.8 4.65 7.88 8.69 5.1

Diagnostic for the model with Exa 02 as a function of Exa 02

## 
## Call:
## lm(formula = scale(Exa02) ~ scale(Exa01), data = new_scores)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6086 -0.4270  0.2016  0.7886  1.4031 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -1.148e-16  1.220e-01   0.000   1.0000  
## scale(Exa01)  2.175e-01  1.230e-01   1.769   0.0818 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9838 on 63 degrees of freedom
## Multiple R-squared:  0.04731,    Adjusted R-squared:  0.03219 
## F-statistic: 3.128 on 1 and 63 DF,  p-value: 0.08178

##              Test stat Pr(>|Test stat|)
## scale(Exa01)                           
## Tukey test      1.0194            0.308

## named integer(0)
## named integer(0)

Diagnostic for the model with Exa 02 as a function of Group and Exa 02

## 
## Call:
## lm(formula = scale(Exa02) ~ scale(Exa01) + Group, data = new_scores)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.62610 -0.22442  0.04366  0.23241  0.95268 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.05016    0.27681  -7.406 1.76e-09 ***
## scale(Exa01)  0.06346    0.07771   0.817 0.418137    
## Group1        0.46093    0.38981   1.182 0.242849    
## Group2        1.68932    0.38388   4.401 5.99e-05 ***
## Group3        2.63170    0.42346   6.215 1.18e-07 ***
## Group4        1.89242    0.41690   4.539 3.79e-05 ***
## Group5        2.47805    0.38247   6.479 4.63e-08 ***
## Group6        2.61437    0.42875   6.098 1.78e-07 ***
## Group7        2.23981    0.38948   5.751 6.01e-07 ***
## Group8        2.09477    0.36861   5.683 7.62e-07 ***
## Group10       1.57441    0.38788   4.059 0.000181 ***
## Group11       2.33221    0.36916   6.318 8.19e-08 ***
## Group12       2.46999    0.43128   5.727 6.53e-07 ***
## Group13       1.11252    0.38867   2.862 0.006214 ** 
## Group14       2.66294    0.35133   7.580 9.56e-10 ***
## Group15       3.24595    0.36985   8.776 1.52e-11 ***
## Group16       2.96821    0.39048   7.601 8.86e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5401 on 48 degrees of freedom
## Multiple R-squared:  0.7812, Adjusted R-squared:  0.7083 
## F-statistic: 10.71 on 16 and 48 DF,  p-value: 7.319e-11

##              Test stat Pr(>|Test stat|)
## scale(Exa01)                           
## Group                                  
## Tukey test      0.4141           0.6788

##                  GVIF Df GVIF^(1/(2*Df))
## scale(Exa01) 1.324842  1        1.151018
## Group        1.324842 15        1.009421

## 34 
## 34
## 34 
## 34

Extra: Classifications and Confusion matrixes

Beyond the Adjusted R2, we could test how the model trained with 70% of the total sample can predict the remaining 30%.

The table below shows the ID of the random sample (30% of the total sample), the predicted values and the values after being rounded (round digit = 1).

## 
## Call:
## lm(formula = Exa02 ~ Exa01 + Group, data = trainingSet)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89633 -0.31620  0.03918  0.31854  1.01444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.86441    0.71799   8.168 6.84e-09 ***
## Exa01        0.04306    0.09802   0.439  0.66384    
## Group1       0.61762    0.60936   1.014  0.31947    
## Group2       1.64231    0.64919   2.530  0.01732 *  
## Group3       3.10978    0.65363   4.758 5.38e-05 ***
## Group4       2.06239    0.69031   2.988  0.00579 ** 
## Group5       2.94742    0.57981   5.083 2.21e-05 ***
## Group6       3.22544    0.90787   3.553  0.00137 ** 
## Group7       2.30742    0.67599   3.413  0.00197 ** 
## Group8       2.69440    0.67797   3.974  0.00045 ***
## Group10      2.09471    0.68476   3.059  0.00485 ** 
## Group11      2.80890    0.55805   5.033 2.53e-05 ***
## Group12      2.91451    0.62755   4.644 7.33e-05 ***
## Group13      1.32080    0.56119   2.354  0.02585 *  
## Group14      3.12583    0.51063   6.121 1.32e-06 ***
## Group15      3.74419    0.71493   5.237 1.45e-05 ***
## Group16      3.48859    0.56563   6.168 1.17e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7101 on 28 degrees of freedom
## Multiple R-squared:  0.7801, Adjusted R-squared:  0.6544 
## F-statistic: 6.207 on 16 and 28 DF,  p-value: 1.453e-05
Prediction Rounded.prediction Actual.value
4 8.8 9 8
5 8.9 9 9
7 8.1 8 9
10 9.4 9 9
14 7.7 8 8
21 9.9 10 10
22 6.8 7 7
24 8.2 8 7
25 8.5 8 10
26 9.9 10 10
28 6.2 6 6
31 9.0 9 9
35 9.4 9 10
45 9.2 9 9
47 8.2 8 8
49 7.7 8 9
57 8.8 9 8
60 9.4 9 9
61 9.9 10 10
62 8.5 8 8

In order to assess the performance of our predict we use a so-called confusion matrix which shows the real values in the rows, and the predicted values in the columns.
If we look at the first row of our confusion table, we see that there were 3 students with 10 as final score of Exa02 which were corretctly predicted, 3 that were predicted to have 9 and got 10, 1 who was expected to have 8 and got 10.
If we add all the values on the diagonal we obtain the values that were correctly predicted (3+3+1=7). If we divide such number for the overall set of values to predict we obtain (7 20= 0.35)

confMatrix_01 <- getPredictionExa(
  myTree = lm(Exa02~ Exa01+Group, trainingSet), testingSet = testingSet, actual_class= actual_class_2, isTree = FALSE
  )

confMatrix_01
## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  3 2  0 0 0
##   9  2 5  0 0 0
##   10 1 1  3 0 0
##   7  1 0  0 1 0
##   6  0 0  0 0 1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.65            
##                  95% CI : (0.4078, 0.8461)
##     No Information Rate : 0.4             
##     P-Value [Acc > NIR] : 0.02103         
##                                           
##                   Kappa : 0.5189          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity            0.4286   0.6250    1.0000   1.0000     1.00
## Specificity            0.8462   0.8333    0.8824   0.9474     1.00
## Pos Pred Value         0.6000   0.7143    0.6000   0.5000     1.00
## Neg Pred Value         0.7333   0.7692    1.0000   1.0000     1.00
## Precision              0.6000   0.7143    0.6000   0.5000     1.00
## Recall                 0.4286   0.6250    1.0000   1.0000     1.00
## F1                     0.5000   0.6667    0.7500   0.6667     1.00
## Prevalence             0.3500   0.4000    0.1500   0.0500     0.05
## Detection Rate         0.1500   0.2500    0.1500   0.0500     0.05
## Detection Prevalence   0.2500   0.3500    0.2500   0.1000     0.05
## Balanced Accuracy      0.6374   0.7292    0.9412   0.9737     1.00

The accuracy value of 0.65 means that the model can correctly predict 65% of the values in the testing sample. The value K = 0.5 shows the probability of getting such results by chance.

To put this accuracy level in context, we compare it with the performance of a naive classifier, which uses the most probable value for the testing dataset is 9.
This value occurs 7 times in the dataset of all the 20 rows of the training dataset. Hence, a naive classifier that guesses every time the most probable value, would have an accuracy of 7/20=0.35.
It is worth noticing that the performance of the naive classifier (35%) is a better estimate than the probability of getting the right score for each student out of 10 possible values ( 1/10 =10%).

Another benchmark for our model would be the model that uses only Exa01 to predict Exa02.

confMatrix_01_simple <- getPredictionExa(
  lm(Exa02~ Exa01, trainingSet),  testingSet = testingSet, actual_class_2, FALSE
  )

confMatrix_01_simple
## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  5 0  0 0 0
##   9  2 5  0 0 0
##   10 3 2  0 0 0
##   7  2 0  0 0 0
##   6  1 0  0 0 0
## 
## Overall Statistics
##                                         
##                Accuracy : 0.5           
##                  95% CI : (0.272, 0.728)
##     No Information Rate : 0.65          
##     P-Value [Acc > NIR] : 0.9468        
##                                         
##                   Kappa : 0.3007        
##                                         
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity            0.3846   0.7143        NA       NA       NA
## Specificity            1.0000   0.8462      0.75      0.9     0.95
## Pos Pred Value         1.0000   0.7143        NA       NA       NA
## Neg Pred Value         0.4667   0.8462        NA       NA       NA
## Precision              1.0000   0.7143      0.00      0.0     0.00
## Recall                 0.3846   0.7143        NA       NA       NA
## F1                     0.5556   0.7143        NA       NA       NA
## Prevalence             0.6500   0.3500      0.00      0.0     0.00
## Detection Rate         0.2500   0.2500      0.00      0.0     0.00
## Detection Prevalence   0.2500   0.3500      0.25      0.1     0.05
## Balanced Accuracy      0.6923   0.7802        NA       NA       NA

As previously shown, the accuracy of the model which does not take into account the group effect is lower than the other model (0.5 VS 0.65).

A model that uses only Exa01 and Group appears to be able to describe the team performance, but it is not really able to predict it in reality: the value of Exa01 can be obtained only few minutes before students starts working on Exa02 and the coefficients of each team will be obtained only once Exa02 is over.

We could try to increase the predictive power of linear model by using attendances and midterm to predict Exa02.

confMatrix_01_full_group <- getPredictionExa(
  lm(Exa02~ Exa01+Attendance+MidTerm+Group, trainingSet),  testingSet = testingSet, actual_class_2, FALSE
  )

confMatrix_01_full_group
## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  2 3  0 0 0
##   9  2 5  0 0 0
##   10 0 2  3 0 0
##   7  1 0  0 1 0
##   6  0 0  0 0 1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6             
##                  95% CI : (0.3605, 0.8088)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : 0.2517          
##                                           
##                   Kappa : 0.4425          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity              0.40   0.5000    1.0000   1.0000     1.00
## Specificity              0.80   0.8000    0.8824   0.9474     1.00
## Pos Pred Value           0.40   0.7143    0.6000   0.5000     1.00
## Neg Pred Value           0.80   0.6154    1.0000   1.0000     1.00
## Precision                0.40   0.7143    0.6000   0.5000     1.00
## Recall                   0.40   0.5000    1.0000   1.0000     1.00
## F1                       0.40   0.5882    0.7500   0.6667     1.00
## Prevalence               0.25   0.5000    0.1500   0.0500     0.05
## Detection Rate           0.10   0.2500    0.1500   0.0500     0.05
## Detection Prevalence     0.25   0.3500    0.2500   0.1000     0.05
## Balanced Accuracy        0.60   0.6500    0.9412   0.9737     1.00

The accuracy of this model is worse than the previous model (0.6 VS 0.65).

Moving beyond linear models, the decision tree that uses Attendance, Midterm and Exa01 to predict Exa02 shows some interesting features.
As expected, if attendance is low, the final score of Exa02 is low.
Surprinsingly, if the midterm exam is high, Exa02 is fairly low; this could be due to the fact that students who care most about passing the course do not put additional effort if the midterm exam is high enough.
Also, high Exa01 leads to lower Exa02; this could be an indication that students who care about individual performance do not perform well in a team.

set.seed(1)

getPredictionExa(
  rpart(Exa02_tree ~ Attendance+MidTerm+Exa01 ,
   method="anova",
   data= trainingSet
   ),  testingSet = testingSet, actual_class_2, isTree = TRUE
)

## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  5 0  0 0 0
##   9  0 7  0 0 0
##   10 1 4  0 0 0
##   7  2 0  0 0 0
##   6  1 0  0 0 0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.6             
##                  95% CI : (0.3605, 0.8088)
##     No Information Rate : 0.55            
##     P-Value [Acc > NIR] : 0.4143          
##                                           
##                   Kappa : 0.4245          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity            0.5556   0.6364        NA       NA       NA
## Specificity            1.0000   1.0000      0.75      0.9     0.95
## Pos Pred Value         1.0000   1.0000        NA       NA       NA
## Neg Pred Value         0.7333   0.6923        NA       NA       NA
## Precision              1.0000   1.0000      0.00      0.0     0.00
## Recall                 0.5556   0.6364        NA       NA       NA
## F1                     0.7143   0.7778        NA       NA       NA
## Prevalence             0.4500   0.5500      0.00      0.0     0.00
## Detection Rate         0.2500   0.3500      0.00      0.0     0.00
## Detection Prevalence   0.2500   0.3500      0.25      0.1     0.05
## Balanced Accuracy      0.7778   0.8182        NA       NA       NA

Since the attendance seems to be a reliable indicator, we assess it more in details and we substitute it with the weekly evaluations. The model shows some interesting features:

Since the overall score of each week seems promising, we assess it more in details, and we substitute it with detailed scores about (a) team and (b) invividual performance. The Class and Group components are included in the model but they do not appear in the decision tree.

confMatrix_02 <- getPredictionExa(
  rpart(Exa02_tree ~           
          # Class+Group
          # +A1+A2+A3+A4+A5+A6+A7+A8+A9
        +MidTerm+Exa01
        +A1a+A1b+A2a+A2b+A3a+A3b+A4a+A4b+A5a+A5b+A6a+A6b+A7a+A7b+A8a+A8b+A9a+A9b,
   method="anova",
   data= trainingSet
   ), testingSet = testingSet, actual_class_2, TRUE
)

confMatrix_02
## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  5 0  0 0 0
##   9  3 1  3 0 0
##   10 1 2  2 0 0
##   7  2 0  0 0 0
##   6  1 0  0 0 0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4             
##                  95% CI : (0.1912, 0.6395)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.979           
##                                           
##                   Kappa : 0.1837          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity            0.4167   0.3333      0.40       NA       NA
## Specificity            1.0000   0.6471      0.80      0.9     0.95
## Pos Pred Value         1.0000   0.1429      0.40       NA       NA
## Neg Pred Value         0.5333   0.8462      0.80       NA       NA
## Precision              1.0000   0.1429      0.40      0.0     0.00
## Recall                 0.4167   0.3333      0.40       NA       NA
## F1                     0.5882   0.2000      0.40       NA       NA
## Prevalence             0.6000   0.1500      0.25      0.0     0.00
## Detection Rate         0.2500   0.0500      0.10      0.0     0.00
## Detection Prevalence   0.2500   0.3500      0.25      0.1     0.05
## Balanced Accuracy      0.7083   0.4902      0.60       NA       NA

The decision tree model that uses the group number has high accuracy.

confMatrix_02_group <- getPredictionExa(
  rpart(Exa02_tree ~           
          Class+Group
          # +A1+A2+A3+A4+A5+A6+A7+A8+A9
        +MidTerm+Exa01
        +A1a+A1b+A2a+A2b+A3a+A3b+A4a+A4b+A5a+A5b+A6a+A6b+A7a+A7b+A8a+A8b+A9a+A9b,
   method="anova",
   data= trainingSet
   ), testingSet = testingSet, actual_class_2, TRUE
)

confMatrix_02_group
## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  1 2  0 2 0
##   9  2 2  2 1 0
##   10 0 1  4 0 0
##   7  0 0  0 2 0
##   6  0 0  0 1 0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.45            
##                  95% CI : (0.2306, 0.6847)
##     No Information Rate : 0.3             
##     P-Value [Acc > NIR] : 0.1133          
##                                           
##                   Kappa : 0.2857          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity            0.3333   0.4000    0.6667   0.3333       NA
## Specificity            0.7647   0.6667    0.9286   1.0000     0.95
## Pos Pred Value         0.2000   0.2857    0.8000   1.0000       NA
## Neg Pred Value         0.8667   0.7692    0.8667   0.7778       NA
## Precision              0.2000   0.2857    0.8000   1.0000     0.00
## Recall                 0.3333   0.4000    0.6667   0.3333       NA
## F1                     0.2500   0.3333    0.7273   0.5000       NA
## Prevalence             0.1500   0.2500    0.3000   0.3000     0.00
## Detection Rate         0.0500   0.1000    0.2000   0.1000     0.00
## Detection Prevalence   0.2500   0.3500    0.2500   0.1000     0.05
## Balanced Accuracy      0.5490   0.5333    0.7976   0.6667       NA
# Adding a flag with the same rules than the decision tree
clusters <- new_scores%>%
                        group_by(Group)%>%
                        dplyr::summarise(median(Exa02))

clusters$ClusterID <- ifelse(clusters$`median(Exa02)`>9.5,3,ifelse(clusters$`median(Exa02)`>8.4,2,1))

# Adding the flag to the dataset
new_scores_clusters <- new_scores%>%
  left_join(clusters, by=c("Group" = "Group"))

# Adding the column in the training and testing samples
trainingSet <- new_scores_clusters[trainingSample,-1]
testingSet <- new_scores_clusters[-trainingSample,-1]


# Predicting the group cluster
confMatrix_03 <- getPredictionExa(
  rpart(ClusterID ~           
          Class
          # +Group
          # +A1+A2+A3+A4+A5+A6+A7+A8+A9
        +MidTerm
        +Exa01
        +A1a+A1b+A2a+A2b+A3a+A3b+A4a+A4b+A5a+A5b+A6a+A6b+A7a+A7b+A8a+A8b+A9a+A9b,
   method="anova",
   data= trainingSet
   ), testingSet = testingSet, testingSet$ClusterID, TRUE
)

confMatrix_03
## Confusion Matrix and Statistics
## 
##    
##     2 3 1
##   2 8 1 0
##   3 3 1 1
##   1 0 2 4
## 
## Overall Statistics
##                                           
##                Accuracy : 0.65            
##                  95% CI : (0.4078, 0.8461)
##     No Information Rate : 0.55            
##     P-Value [Acc > NIR] : 0.252           
##                                           
##                   Kappa : 0.4422          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 2 Class: 3 Class: 1
## Sensitivity            0.7273   0.2500   0.8000
## Specificity            0.8889   0.7500   0.8667
## Pos Pred Value         0.8889   0.2000   0.6667
## Neg Pred Value         0.7273   0.8000   0.9286
## Precision              0.8889   0.2000   0.6667
## Recall                 0.7273   0.2500   0.8000
## F1                     0.8000   0.2222   0.7273
## Prevalence             0.5500   0.2000   0.2500
## Detection Rate         0.4000   0.0500   0.2000
## Detection Prevalence   0.4500   0.2500   0.3000
## Balanced Accuracy      0.8081   0.5000   0.8333

Using the two items that allow to predict the cluster ID to predict the Exa02

## Confusion Matrix and Statistics
## 
##     
##      8 9 10 7 6
##   8  5 0  0 0 0
##   9  3 1  3 0 0
##   10 1 3  1 0 0
##   7  2 0  0 0 0
##   6  1 0  0 0 0
## 
## Overall Statistics
##                                           
##                Accuracy : 0.35            
##                  95% CI : (0.1539, 0.5922)
##     No Information Rate : 0.6             
##     P-Value [Acc > NIR] : 0.9935          
##                                           
##                   Kappa : 0.1096          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 8 Class: 9 Class: 10 Class: 7 Class: 6
## Sensitivity            0.4167   0.2500    0.2500       NA       NA
## Specificity            1.0000   0.6250    0.7500      0.9     0.95
## Pos Pred Value         1.0000   0.1429    0.2000       NA       NA
## Neg Pred Value         0.5333   0.7692    0.8000       NA       NA
## Precision              1.0000   0.1429    0.2000      0.0     0.00
## Recall                 0.4167   0.2500    0.2500       NA       NA
## F1                     0.5882   0.1818    0.2222       NA       NA
## Prevalence             0.6000   0.2000    0.2000      0.0     0.00
## Detection Rate         0.2500   0.0500    0.0500      0.0     0.00
## Detection Prevalence   0.2500   0.3500    0.2500      0.1     0.05
## Balanced Accuracy      0.7083   0.4375    0.5000       NA       NA

As a comparison between the two models:

Controlling for different seed numbers

Using pmax to assess the max of each row
Seed KPI Exa02…f.Exa01. Exa02…f.Exa01..Group Exa02…f.Weekly.Evaluation…Decision.tree. Check
1 Accuracy 0.50 0.65 0.40 FALSE
2 Accuracy 0.50 0.65 0.40 FALSE
3 Accuracy 0.20 0.60 0.45 FALSE
4 Accuracy 0.40 0.55 0.55 TRUE
5 Accuracy 0.45 0.55 0.55 TRUE
6 Accuracy 0.40 0.70 0.30 FALSE

Testing different training and testing sets

Seed.Train Linear Linear Group Tree
1 0.350 0.610 0.435
2 0.355 0.630 0.485
3 0.350 0.645 0.360
4 0.355 0.595 0.375
5 0.355 0.550 0.500
6 0.305 0.575 0.430
7 0.355 0.570 0.415
8 0.355 0.575 0.345
9 0.345 0.590 0.375
10 0.345 0.570 0.265
11 0.355 0.585 0.525
12 0.345 0.625 0.395
13 0.350 0.585 0.375
14 0.375 0.560 0.390
15 0.350 0.635 0.325
16 0.355 0.560 0.500
17 0.340 0.610 0.360
18 0.355 0.580 0.455
19 0.355 0.560 0.420
20 0.355 0.600 0.475
21 0.330 0.605 0.500
22 0.375 0.565 0.535
23 0.355 0.580 0.375
24 0.335 0.615 0.380
25 0.370 0.500 0.385
26 0.350 0.615 0.535
27 0.315 0.590 0.490
28 0.320 0.595 0.385
29 0.325 0.595 0.410
30 0.350 0.575 0.525