The impact of Generative AI on students’ marks

Dr Peter K. Dunn (UniSC)
Slides at: https://rpubs.com/PeterKD/1290652

01 April 2025

Context

Setting

  • Level: SCI110 (Science Research Methods) is a first-year, non-calculus statistics course
  • Size: Over 1200 students each year (over 450 students in each of two semesters)
  • Abilities: Students have varied mathematics ability

Content

  • Writing research questions
  • Basics of designing studies
    (Hawthorne effect; confounding; sampling; etc.)
  • Graphing
  • Computing confidence intervals
    (means; proportions; mean difference; odds ratios, etc.)
  • Performing hypothesis testing
    (\(t\)-tests; \(\chi^2\)-tests; simple linear regression; not ANOVA)

Disciplines

  • Many science disciplines:
    Environmental Management; Animal Ecology; Mathematics; etc…
  • Many allied health disciplines (not psychology): Paramedic Science; Biomedical Science; Clinical Exercise Physiology; Dietetics; etc…

Red flags

  • A large course (and one statistician…)
  • Students generally are in their first-year (and often first-semester) of uni
  • A course no-one really wants to do (no statistics major at UniSC)
  • A course with students of varying levels of ability, skills, disciplines, …
  • At UniSC, 38.6% are first-in-family
  • A course that includes… MATHS!

Assessment details

Assessment

  • Quizzes: 25% (five quizzes, 5% each)

Assessment

  • Project Proposal: \(15\)% (group work): students complete a Pro Forma

Assessment

  • Project Report: \(20\)% (group work): students produce PowerPoint slides

Assessment

Exam at end of semester: \(40\)% (essentially an online quiz)

Assessment: quizzes

  • Quizzes: Five online quizzes (\(5\)% each; total of \(25\)%)
    • Open for one week each; notional \(2\)-hr limit per attempt
    • Question pools: each attempt likely to be different
    • Unlimited number of attempts; highest-ever mark recorded
    • Feedback provided online
    • Students encouraged to ask questions and seek help; quizzes promoted as ‘learning opportunities’

Assessment: Project Proposal

  • Project: A very small, very simple research project
  • Part A: Project Proposal (group work; worth \(15\)%)
    • Students devise a RQ and plan research design: human creativity involved
    • Free choice, with many restrictions (ethical; practical and feasible; achievable)
    • Students complete a Pro Forma
    • Substantial written feedback provided
    • AI permitted for help with generating a RQ, and must be declared
    • AI discouraged elsewhere, due to Pro Forma, ethics and restrictions placed on RQ

Assessment: Project Report

  • Project: A very small, very simple research project
  • Part B: Project Report (group work; worth \(20\)%)
    • Based on Part A and feedback, student collect data, analyse data, write a report
    • Report is a set of (roughly \(25\)\(30\)) PowerPoint slides
    • Students do not have to deliver a presentation
    • AI permitted for help with writing, and must be declared
    • AI discouraged elsewhere, due to specific language used, specific tests taught

Assessment: exam

  • Exam (\(40\)%, online quiz)
    • Un-invigilated (UniSC restriction)
    • Multiple-choice questions with one correct answer
    • Questions in random order
    • Options in random order
    • Each question number has a pool of three questions from which to draw: each exam is likely to be different
    • Disclaimer at start of exam: AI is not to be used

Quick poll

In which assessment items will AI have the greatest and least impact?

Go to: www.menti.com/alzzqbu4knky →

Or scan the QR code:

Poll QR Code

Our study

Timeline of GenAI

Timeline of GenAI

Timeline of GenAI

Timeline of GenAI

Timeline of GenAI

Research questions

RQs one

RQ1: What is the change in the mean marks pre- to post-AI, for each assessment task?

RQs two

RQ2: How has the correlation between assessments changed from pre- to post-AI?

RQs three

RQ3: How does the grade distribution change pre- to post-AI?

RQs four

RQ4: How have students’ attempts and marks in the sample examinations changed pre-AI to post-AI.

Results

Number of students

Offer Number of students When
2022: SEM1 673 Pre-AI
SEM2 529 Pre-AI
2023: SEM1 579 Transition to AI
SEM2 496 Transition to AI
2024: SEM1 559 Post-AI
SEM2 473 Post-AI

Quiz marks

Project Proposal marks

Project Report marks

Exam marks

Overall marks

ChatGPT users

RQ1 Results: changes in marks

Changes in marks: quizzes

Mean Quiz marks
2022 2024 SEM Change \(P\)-value
ALL students \(64.64\) \(68.77\) \(4.12\uparrow\) \(0.003\)
Passing students \(72.05\) \(76.40\) \(4.35\uparrow\) <\(0.001\)
Failing students \(32.17\) \(24.74\) \(–7.43\downarrow\) \(0.007\)
  • Passing students did slightly better
  • Failing students did slightly worse

Changes in marks: proposal

Mean Task 2A marks
2022 2024 SEM Change \(P\)-value
ALL students \(61.49\) \(63.89\) \(2.32\) \(0.190\)
Passing students \(76.54\) \(76.69\) \(0.15\) \(0.917\)
Failing students \(49.34\) \(43.93\) \(-5.51\) \(0.154\)
  • No changes
  • GenAI perhaps less useful: requires human creativity?

Changes in marks: report

Mean Task 2B marks
2022 2024 SEM Change \(P\)-value
ALL students \(70.88\) \(63.62\) \(-10.44\downarrow\) <\(0.001\)
Passing students \(78.24\) \(66.86\) \(-11.38\downarrow\) <\(0.001\)
Failing students \(38.54\) \(21.16\) \(-17.38\downarrow\) <\(0.001\)
  • ChatGPT did not use correct language…?
  • ChatGPT did not suggest correct test…?

Changes in marks: exam

Mean Exam marks
2022 2024 SEM Change \(P\)-value
ALL students \(49.59\) \(71.47\) \(21.88\uparrow\) <\(0.001\)
Passing students \(56.49\) \(73.90\) \(17.40\uparrow\) <\(0.001\)
Failing students \(19.16\) \(41.82\) \(22.66\uparrow\) <\(0.001\)
  • Exam is online and not invigilated
  • M/C questions easy to cut-and-paste into ChatGPT?

Changes in marks: overall marks

Mean Overall marks
2022 2024 SEM Change \(P\)-value
ALL students \(60.96\) \(66.29\) \(5.19\uparrow\) <\(0.001\)
Passing students \(67.74\) \(73.46\) \(5.71\uparrow\) <\(0.001\)
Failing students \(31.18\) \(24.89\) \(-6.30\downarrow\) \(0.007\)
  • Passing students did slightly better
  • Failing students did slightly worse

RQ2 Results: correlations between assessment items

Results: between exam and quizzes

Results: between exam and proposal

Results: between exam and report

  • A correlation between Task 2B and the Exam expected
  • Latest offer: No correlation between Task 2B and Exam

RQ3 Results: grade distributions

Results: grade distributions

  • Pass rate almost the same every offer
  • The distribution of passing grades has changed
  • In general, a shift towards higher grades

Results: grade distributions

  • Higher proportion of HDs
  • Much higher proportion of DNs
  • Lower proportion of PSs

RQ4 Results: sample exams

Results: sample exams

  • Fewer students attempting sample exams
  • Mean mark on sample exams increased slightly

Conclusions

  • After AI was introduced, marks have been substantially impacted
  • Exam marks have increased substantially (and significantly)
  • Report marks have decreased substantially (and significantly)
  • Passing and failing students impacted differently for quizzes and overall marks
  • Percentage of DN have increased; percentage of PS have decreased
  • Percentage of fails have barely changed

References

Dunn, P. K. Changes in students marks in a first-year statistics course after the introduction of GenAI. Submitted.

Thanks to Samuel Dunn for help writing some JavaScript code to extract group information from Canvas.

Image credits

Background images by:

Image credits

Background images by: