The impact of Generative AI on students’ marks

Dr Peter K. Dunn (UniSC)
Slides at: https://rpubs.com/PeterKD/1290652

01 April 2025

Context

Setting

Level: SCI110 (Science Research Methods) is a first-year, non-calculus statistics course
Size: Over 1200 students each year (over 450 students in each of two semesters)
Abilities: Students have varied mathematics ability

Content

Writing research questions
Basics of designing studies
(Hawthorne effect; confounding; sampling; etc.)
Graphing
Computing confidence intervals
(means; proportions; mean difference; odds ratios, etc.)
Performing hypothesis testing
(\(t\)-tests; \(\chi^2\)-tests; simple linear regression; not ANOVA)

Disciplines

Many science disciplines:
Environmental Management; Animal Ecology; Mathematics; etc…
Many allied health disciplines (not psychology): Paramedic Science; Biomedical Science; Clinical Exercise Physiology; Dietetics; etc…

Red flags

A large course (and one statistician…)
Students generally are in their first-year (and often first-semester) of uni
A course no-one really wants to do (no statistics major at UniSC)
A course with students of varying levels of ability, skills, disciplines, …
At UniSC, 38.6% are first-in-family
A course that includes… MATHS!

Assessment details

Assessment

Quizzes: 25% (five quizzes, 5% each)

Assessment

Project Proposal: \(15\)% (group work): students complete a Pro Forma

Assessment

Project Report: \(20\)% (group work): students produce PowerPoint slides

Assessment

Exam at end of semester: \(40\)% (essentially an online quiz)

Assessment: quizzes

Quizzes: Five online quizzes (\(5\)% each; total of \(25\)%)
- Open for one week each; notional \(2\)-hr limit per attempt
- Question pools: each attempt likely to be different
- Unlimited number of attempts; highest-ever mark recorded
- Feedback provided online
- Students encouraged to ask questions and seek help; quizzes promoted as ‘learning opportunities’

Assessment: Project Proposal

Project: A very small, very simple research project
Part A: Project Proposal (group work; worth \(15\)%)
- Students devise a RQ and plan research design: human creativity involved
- Free choice, with many restrictions (ethical; practical and feasible; achievable)
- Students complete a Pro Forma
- Substantial written feedback provided
- AI permitted for help with generating a RQ, and must be declared
- AI discouraged elsewhere, due to Pro Forma, ethics and restrictions placed on RQ

Assessment: Project Report

Project: A very small, very simple research project
Part B: Project Report (group work; worth \(20\)%)
- Based on Part A and feedback, student collect data, analyse data, write a report
- Report is a set of (roughly \(25\)–\(30\)) PowerPoint slides
- Students do not have to deliver a presentation
- AI permitted for help with writing, and must be declared
- AI discouraged elsewhere, due to specific language used, specific tests taught

Assessment: exam

Exam (\(40\)%, online quiz)
- Un-invigilated (UniSC restriction)
- Multiple-choice questions with one correct answer
- Questions in random order
- Options in random order
- Each question number has a pool of three questions from which to draw: each exam is likely to be different
- Disclaimer at start of exam: AI is not to be used

Quick poll

In which assessment items will AI have the greatest and least impact?

Go to: www.menti.com/alzzqbu4knky →

Or scan the QR code:

Our study

Timeline of GenAI

Research questions

RQs one

RQ1: What is the change in the mean marks pre- to post-AI, for each assessment task?

RQs two

RQ2: How has the correlation between assessments changed from pre- to post-AI?

RQs three

RQ3: How does the grade distribution change pre- to post-AI?

RQs four

RQ4: How have students’ attempts and marks in the sample examinations changed pre-AI to post-AI.

Results

Number of students

Offer	Number of students	When
2022: SEM1	673	Pre-AI
SEM2	529	Pre-AI
2023: SEM1	579	Transition to AI
SEM2	496	Transition to AI
2024: SEM1	559	Post-AI
SEM2	473	Post-AI

Quiz marks

Project Proposal marks

Project Report marks

Exam marks

Overall marks

ChatGPT users

RQ1 Results: changes in marks

Changes in marks: quizzes

Mean Quiz marks
	2022	2024 SEM	Change	\(P\)-value
ALL students	\(64.64\)	\(68.77\)	\(4.12\uparrow\)	\(0.003\)

Passing students	\(72.05\)	\(76.40\)	\(4.35\uparrow\)	<\(0.001\)
Failing students	\(32.17\)	\(24.74\)	\(–7.43\downarrow\)	\(0.007\)

Passing students did slightly better
Failing students did slightly worse

Changes in marks: proposal

Mean Task 2A marks
	2022	2024 SEM	Change	\(P\)-value
ALL students	\(61.49\)	\(63.89\)	\(2.32\)	\(0.190\)

Passing students	\(76.54\)	\(76.69\)	\(0.15\)	\(0.917\)
Failing students	\(49.34\)	\(43.93\)	\(-5.51\)	\(0.154\)

No changes
GenAI perhaps less useful: requires human creativity?

Changes in marks: report

Mean Task 2B marks
	2022	2024 SEM	Change	\(P\)-value
ALL students	\(70.88\)	\(63.62\)	\(-10.44\downarrow\)	<\(0.001\)

Passing students	\(78.24\)	\(66.86\)	\(-11.38\downarrow\)	<\(0.001\)
Failing students	\(38.54\)	\(21.16\)	\(-17.38\downarrow\)	<\(0.001\)

ChatGPT did not use correct language…?
ChatGPT did not suggest correct test…?

Changes in marks: exam

Mean Exam marks
	2022	2024 SEM	Change	\(P\)-value
ALL students	\(49.59\)	\(71.47\)	\(21.88\uparrow\)	<\(0.001\)

Passing students	\(56.49\)	\(73.90\)	\(17.40\uparrow\)	<\(0.001\)
Failing students	\(19.16\)	\(41.82\)	\(22.66\uparrow\)	<\(0.001\)

Exam is online and not invigilated
M/C questions easy to cut-and-paste into ChatGPT?

Changes in marks: overall marks

Mean Overall marks
	2022	2024 SEM	Change	\(P\)-value
ALL students	\(60.96\)	\(66.29\)	\(5.19\uparrow\)	<\(0.001\)

Passing students	\(67.74\)	\(73.46\)	\(5.71\uparrow\)	<\(0.001\)
Failing students	\(31.18\)	\(24.89\)	\(-6.30\downarrow\)	\(0.007\)

Passing students did slightly better
Failing students did slightly worse

RQ2 Results: correlations between assessment items

Results: between exam and quizzes

Results: between exam and proposal

Results: between exam and report

A correlation between Task 2B and the Exam expected
Latest offer: No correlation between Task 2B and Exam

RQ3 Results: grade distributions

Results: grade distributions

Pass rate almost the same every offer
The distribution of passing grades has changed
In general, a shift towards higher grades

Results: grade distributions

Higher proportion of HDs
Much higher proportion of DNs
Lower proportion of PSs

RQ4 Results: sample exams

Results: sample exams

Fewer students attempting sample exams
Mean mark on sample exams increased slightly

Conclusions

After AI was introduced, marks have been substantially impacted
Exam marks have increased substantially (and significantly)
Report marks have decreased substantially (and significantly)
Passing and failing students impacted differently for quizzes and overall marks
Percentage of DN have increased; percentage of PS have decreased
Percentage of fails have barely changed

References

Dunn, P. K. Changes in students marks in a first-year statistics course after the introduction of GenAI. Submitted.

Thanks to Samuel Dunn for help writing some JavaScript code to extract group information from Canvas.

Image credits

Background images by:

Uscjkirklan - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=81992068
Free-Photos from Pixabay
Tom und Nicki Löschner from Pixabay
Jess Foami from Pixabay
Free-Photos from Pixabay
Syauqi Fillah from Pixabay
AkshayaPatra Foundation from Pixabay

Image credits

Background images by:

The impact of Generative AI on students’ marks

Dr Peter K. Dunn (UniSC) Slides at: https://rpubs.com/PeterKD/1290652

01 April 2025

Context

Setting

Content

Disciplines

Red flags

Assessment details

Assessment

Assessment

Assessment

Assessment

Assessment: quizzes

Assessment: Project Proposal

Assessment: Project Report

Assessment: exam

Quick poll

Our study

Timeline of GenAI

Timeline of GenAI

Timeline of GenAI

Timeline of GenAI

Timeline of GenAI

Research questions

RQs one

RQs two

RQs three

RQs four

Results

Number of students

Quiz marks

Project Proposal marks

Project Report marks

Exam marks

Overall marks

ChatGPT users

RQ1 Results: changes in marks

Changes in marks: quizzes

Changes in marks: proposal

Changes in marks: report

Changes in marks: exam

Changes in marks: overall marks

RQ2 Results: correlations between assessment items

Results: between exam and quizzes

Results: between exam and proposal

Results: between exam and report

RQ3 Results: grade distributions

Results: grade distributions

Results: grade distributions

RQ4 Results: sample exams

Results: sample exams

Conclusions

References

Image credits

Image credits

Dr Peter K. Dunn (UniSC)
Slides at: https://rpubs.com/PeterKD/1290652