If the video does not load automatically, you can find it here: How to assess online the individual performance in team projects?
This is the link to our article link
Corresponding author: Riccardo Bonazzi (University of Applied Sciences Western Switzerland (HES-SO)
We have decided to see the current situation as an opportunity to try a new presentation format that combines (a) a short video (2 minutes) and (b) a set of slides to describe our idea more in details. Hence, this storyboard illustrates our algorithm in a way that is …
structured. Thanks to R Markdown, the text of each slide is embedded with the code the algorithm. So you are seeing how the data gets created.
dynamic. To test our models we split our data in a training and testing sample. In the original version executed on RStudio, the results change every time you restart the page and that shows that we did not use a specific set of parameters to obtain good results. In this static version on the web, the results do not change over time.
At the end of the dashboard, we share some of the new insights that we are obtaining from the data of this semester.
In this article we study learning analytics for experiential learning.
Team projects can teach students how to work together but to assess in a team project each individual contributions is hard, and no existing tool can easily assess group effects on individual performance.
To solve our problem, we assess “performativity” that measures how much the students build on each other’s’ contributions and we ask ourselves the following research question: How to rapidly assess performativity in project-based learning scenarios?
In this presentation, we want to improve the immediate feedback assessment technique (IF-AT) by developing an online solution, which could allow students to do the final exam by themselves and then to get partial credits if they managed to correct their mistakes, by discussing with their team members. This way, we could measure the degree of performativity in each team.
We position our study in the field of design science research and we describe how we developed an artefact in the shape of a prototype, by following the guidelines of Peffers et al. (2007). This presentation has been built around the 6 steps suggested by the authors.
(i - ii) Identify the problem and and define the objectives of the solution –> Slide 02
Design and development of the artefact –> slide 04
Demonstration of our methodology –> Slide 05
EValuation (preliminary data) –> Slide 06
Communication. beside presenting our results at HICSS, we have shared our preliminary insights with colleagues, whose courses have been disrupted by the Covid-19 situation.
The underlying idea of DuoTest is simple: to allow students to do their final exams twice in a row.
The first time, participants do their exam individually (Exa01); the second time, they solve the same exam in groups (Exa02).
By comparing individual and team performances, the system induces the positive (or negative) effect of each group over the individual performances.
Previous studies have already used Rapid Assessment Tests both at the individual and group level; nonetheless, they did not use those tests during the final evaluation and they did try to minimize the effort required to perform such assessments.
The figure illustrates in detail how the DuoTest can be made by using an open-source learning management system (Moodle) and how the data can be analyzed with R Studio to assess team health and transactivity.
This example comes from 70 bachelor students who took the exam in January 2020 after having done a group project with a firm during the fall semester.
Step 1: Students submit online their individual answers to a test
Step 2: Students solve the same test in group and submit their individual answers
Step 3: A (multilevel) model uses the two tests to assess the groups effect
The overall test can be implemented with open source technology (we used Moodle for our demonstration). This type of test can take some 4 hours to be conceived and it is corrected automatically, once the rules are set, like a QCM.
We tested our prototype with three classes of undergraduate students undertaking the same course, for a total of 71 students attending the final exam in Sierre (Switzerland) the 20th of January 2020.
We claim that the exam was (a) valid, since chosen questions provide useful information about the concepts seen in class, (b) reliable, thanks to the rule-driven correction of each question, and (c) recognizable, since it fully replicated the way students work during the semester.
The second graph shows that We can assign to each team a trendline, which describes the positive/negative group effect. The Adjusted R2 of the linear regression is very good (0.70).
In the next slides we show the detailed analysis. We take group 9 as reference, since it has the lowest score.
Insight 01: If high midterm scores lead to low final exam scores, what is the function of the midterm exam ?
Insight 02: each class might have a different degree of peer-learning effect
The decision tree model shows a set of rules to predict the Exa02 scores. Let’s try to assess its predictive power
In this table, we can see the predicted values as rows and the real values in the column. In the diagonal of the table, we can see how many times the system has correctly guessed the scores of the students.
9 8 10 7
9 5 1 2 0
8 1 3 1 3
10 1 1 1 1
7 0 0 0 0
The precision of the model can be calculated as the sum of the values on the diagonal divided by the overall set of values.
Accuracy
0.45
Is this value reliable? Since we took a random sample of 70% to train the algorithm and 30% to test it, we can assess the probability that the precision will be different if we change training and testing sample.
The p value is not very good, probably because we do not have a lot of data to obtain a very realiable estimate.
AccuracyPValue
0.2376224
This year, the students did not have a midterm exam; instead, they had peer evaluations every week during their blended learning course.
The first graph shows that peer reviews lead to a normally distributed distribution between 1.0/6.0 and 6.0/6.0.
The second graph shows the evolution over time of the weekly scores.
The violin shape has been chosen to better show the distribution of scores, instead of a boxplot (link here to see why).
Phase 01 (From Week 01 to Week 04): the overall set of students splits into two subsets, associated with bimodal distributions: (1) active participants and (2) those lagging behind.
Phase 02 (from W05 to W08): the bimodal distributions stretch apart.
Phase 03 (W09-W12): laggers seem try to catch up and to commit more.
Week 08 and week 12 were Q&A sessions and do not have a score; W13 is the simulation of the exam and it can be used to test how well our model can predict individual scores.
Call:
lm(formula = scale(Exa02) ~ scale(Exa01) + relevel(as.factor(Group),
ref = "9") + lowAttendance, data = final_scores)
Residuals:
Min 1Q Median 3Q Max
-1.62263 -0.18849 0.03382 0.22472 0.86861
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.18969 0.27916 -7.844 4.36e-10 ***
scale(Exa01) 0.03951 0.07667 0.515 0.608799
relevel(as.factor(Group), ref = "9")1 0.12268 0.41865 0.293 0.770777
relevel(as.factor(Group), ref = "9")2 1.55657 0.38007 4.096 0.000165 ***
relevel(as.factor(Group), ref = "9")3 2.29889 0.44744 5.138 5.27e-06 ***
relevel(as.factor(Group), ref = "9")4 1.54871 0.44384 3.489 0.001063 **
relevel(as.factor(Group), ref = "9")5 2.12212 0.41624 5.098 6.03e-06 ***
relevel(as.factor(Group), ref = "9")6 2.28810 0.45089 5.075 6.53e-06 ***
relevel(as.factor(Group), ref = "9")7 2.02183 0.39589 5.107 5.85e-06 ***
relevel(as.factor(Group), ref = "9")8 1.94661 0.36708 5.303 3.00e-06 ***
relevel(as.factor(Group), ref = "9")10 1.23299 0.41765 2.952 0.004912 **
relevel(as.factor(Group), ref = "9")11 1.99172 0.40104 4.966 9.43e-06 ***
relevel(as.factor(Group), ref = "9")12 2.14648 0.45263 4.742 2.00e-05 ***
relevel(as.factor(Group), ref = "9")13 0.77245 0.41805 1.848 0.070937 .
relevel(as.factor(Group), ref = "9")14 2.31402 0.38762 5.970 2.99e-07 ***
relevel(as.factor(Group), ref = "9")15 2.90654 0.40139 7.241 3.53e-09 ***
relevel(as.factor(Group), ref = "9")16 2.75175 0.39660 6.938 1.01e-08 ***
lowAttendance 0.48309 0.25262 1.912 0.061942 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5257 on 47 degrees of freedom
Multiple R-squared: 0.797, Adjusted R-squared: 0.7236
F-statistic: 10.86 on 17 and 47 DF, p-value: 4.884e-11
We create a new model that includes the notion of “Attendance” score, which is measured every week.
The Adjusted R2 increases slightly, leading us to believe that attendance in a team-beased learning environment should not be scored by a weighted grade but it should be crucial for the final score.