WELCOME TO OUR PRESENTATION

If the video does not load automatically, you can find it here: How to assess online the individual performance in team projects?

This is the link to our article link

Corresponding author: Riccardo Bonazzi (University of Applied Sciences Western Switzerland (HES-SO)


We have decided to see the current situation as an opportunity to try a new presentation format that combines (a) a short video (2 minutes) and (b) a set of slides to describe our idea more in details. Hence, this storyboard illustrates our algorithm in a way that is …

At the end of the dashboard, we share some of the new insights that we are obtaining from the data of this semester.

02) PROBLEM STATEMENT


In this article we study learning analytics for experiential learning.

Team projects can teach students how to work together but to assess in a team project each individual contributions is hard, and no existing tool can easily assess group effects on individual performance.

To solve our problem, we assess “performativity” that measures how much the students build on each other’s’ contributions and we ask ourselves the following research question: How to rapidly assess performativity in project-based learning scenarios?

In this presentation, we want to improve the immediate feedback assessment technique (IF-AT) by developing an online solution, which could allow students to do the final exam by themselves and then to get partial credits if they managed to correct their mistakes, by discussing with their team members. This way, we could measure the degree of performativity in each team.

03) DESIGN SCIENCE METHODOLOGY


We position our study in the field of design science research and we describe how we developed an artefact in the shape of a prototype, by following the guidelines of Peffers et al. (2007). This presentation has been built around the 6 steps suggested by the authors.

(i - ii) Identify the problem and and define the objectives of the solution –> Slide 02

  1. Design and development of the artefact –> slide 04

  2. Demonstration of our methodology –> Slide 05

  3. EValuation (preliminary data) –> Slide 06

  4. Communication. beside presenting our results at HICSS, we have shared our preliminary insights with colleagues, whose courses have been disrupted by the Covid-19 situation.

04) DESIGN AND DEVELOPMENT: The DuoTest prototype to assess group effects


The underlying idea of DuoTest is simple: to allow students to do their final exams twice in a row.

The first time, participants do their exam individually (Exa01); the second time, they solve the same exam in groups (Exa02).

By comparing individual and team performances, the system induces the positive (or negative) effect of each group over the individual performances.

Previous studies have already used Rapid Assessment Tests both at the individual and group level; nonetheless, they did not use those tests during the final evaluation and they did try to minimize the effort required to perform such assessments.

05) DEMONSTRATION: OUR ARTEFACT


The figure illustrates in detail how the DuoTest can be made by using an open-source learning management system (Moodle) and how the data can be analyzed with R Studio to assess team health and transactivity.

This example comes from 70 bachelor students who took the exam in January 2020 after having done a group project with a firm during the fall semester.

The overall test can be implemented with open source technology (we used Moodle for our demonstration). This type of test can take some 4 hours to be conceived and it is corrected automatically, once the rules are set, like a QCM.

06) EVALUATION: PRELIMINARY DATA ANALYSIS


We tested our prototype with three classes of undergraduate students undertaking the same course, for a total of 71 students attending the final exam in Sierre (Switzerland) the 20th of January 2020.

We claim that the exam was (a) valid, since chosen questions provide useful information about the concepts seen in class, (b) reliable, thanks to the rule-driven correction of each question, and (c) recognizable, since it fully replicated the way students work during the semester.

The second graph shows that We can assign to each team a trendline, which describes the positive/negative group effect. The Adjusted R2 of the linear regression is very good (0.70).

In the next slides we show the detailed analysis. We take group 9 as reference, since it has the lowest score.

7) COMMUNICATION: IF WE CHANGE THE FINAL EXAM, DO WE STILL NEED A MIDTERM EVALUATION ?


The decision tree model shows a set of rules to predict the Exa02 scores. Let’s try to assess its predictive power

In this table, we can see the predicted values as rows and the real values in the column. In the diagonal of the table, we can see how many times the system has correctly guessed the scores of the students.

    
     9 8 10 7
  9  5 1  2 0
  8  1 3  1 3
  10 1 1  1 1
  7  0 0  0 0

The precision of the model can be calculated as the sum of the values on the diagonal divided by the overall set of values.

Accuracy 
    0.45 

Is this value reliable? Since we took a random sample of 70% to train the algorithm and 30% to test it, we can assess the probability that the precision will be different if we change training and testing sample.
The p value is not very good, probably because we do not have a lot of data to obtain a very realiable estimate.

AccuracyPValue 
     0.2376224 

CONCLUSION: CAN WE SHIFT AWAY FROM MIDTERM EXAMS FOR TBL ? A QUICK VIEW OF PRELIMINARY DATA FROM 2020


This year, the students did not have a midterm exam; instead, they had peer evaluations every week during their blended learning course.

The first graph shows that peer reviews lead to a normally distributed distribution between 1.0/6.0 and 6.0/6.0.

The second graph shows the evolution over time of the weekly scores.
The violin shape has been chosen to better show the distribution of scores, instead of a boxplot (link here to see why).

Week 08 and week 12 were Q&A sessions and do not have a score; W13 is the simulation of the exam and it can be used to test how well our model can predict individual scores.

APPENDIX: USING OTHER DATA FROM THE SEMESTER TO PREDICT THE FINAL SCORE (the DETOTUS approach)


Call:
lm(formula = scale(Exa02) ~ scale(Exa01) + relevel(as.factor(Group), 
    ref = "9") + lowAttendance, data = final_scores)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.62263 -0.18849  0.03382  0.22472  0.86861 

Coefficients:
                                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)                            -2.18969    0.27916  -7.844 4.36e-10 ***
scale(Exa01)                            0.03951    0.07667   0.515 0.608799    
relevel(as.factor(Group), ref = "9")1   0.12268    0.41865   0.293 0.770777    
relevel(as.factor(Group), ref = "9")2   1.55657    0.38007   4.096 0.000165 ***
relevel(as.factor(Group), ref = "9")3   2.29889    0.44744   5.138 5.27e-06 ***
relevel(as.factor(Group), ref = "9")4   1.54871    0.44384   3.489 0.001063 ** 
relevel(as.factor(Group), ref = "9")5   2.12212    0.41624   5.098 6.03e-06 ***
relevel(as.factor(Group), ref = "9")6   2.28810    0.45089   5.075 6.53e-06 ***
relevel(as.factor(Group), ref = "9")7   2.02183    0.39589   5.107 5.85e-06 ***
relevel(as.factor(Group), ref = "9")8   1.94661    0.36708   5.303 3.00e-06 ***
relevel(as.factor(Group), ref = "9")10  1.23299    0.41765   2.952 0.004912 ** 
relevel(as.factor(Group), ref = "9")11  1.99172    0.40104   4.966 9.43e-06 ***
relevel(as.factor(Group), ref = "9")12  2.14648    0.45263   4.742 2.00e-05 ***
relevel(as.factor(Group), ref = "9")13  0.77245    0.41805   1.848 0.070937 .  
relevel(as.factor(Group), ref = "9")14  2.31402    0.38762   5.970 2.99e-07 ***
relevel(as.factor(Group), ref = "9")15  2.90654    0.40139   7.241 3.53e-09 ***
relevel(as.factor(Group), ref = "9")16  2.75175    0.39660   6.938 1.01e-08 ***
lowAttendance                           0.48309    0.25262   1.912 0.061942 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5257 on 47 degrees of freedom
Multiple R-squared:  0.797, Adjusted R-squared:  0.7236 
F-statistic: 10.86 on 17 and 47 DF,  p-value: 4.884e-11

We create a new model that includes the notion of “Attendance” score, which is measured every week.
The Adjusted R2 increases slightly, leading us to believe that attendance in a team-beased learning environment should not be scored by a weighted grade but it should be crucial for the final score.