Producing 3* and 4* REF Outputs

The Programme Context

Purpose: build shared starting point for developing stronger REF outputs.

Session 1: REF Familiarisation Workshop (today)
Session 2: Understanding 4\(^*\) Research in Your Field
Session 3: Recognising 3\(^*\) and 4\(^*\) Publications
Session 4: Planning Research with 4\(^*\) Potential
Session 5: Planning for REF-relevant Research Outputs
Session 6: Writing Clinic: Shaping Manuscripts for REF
Session 7: Ongoing Support

What This Session Should Achieve

By the end, we should have:

a shared understanding of why output quality matters for NTU Psychology,
a clearer sense of what 3\(^*\) and 4\(^*\) mean in practice,
a way of discussing output quality using originality, significance, and rigour,
practice calibrating judgements using example abstracts.

What is REF?

Research Excellence Framework: UK system for assessing three broad parts of a university’s research activity.

Universities submit evidence about research quality.
Submissions are organised by Units of Assessment, such as UoA 4: Psychology, Psychiatry and Neuroscience.
REF looks at research outputs (journal articles, books, software, exhibitions), impact beyond academia, and research environment (conditions that support research quality).
The results affect reputation and future research funding.
REF happens periodically: 2014, 2021, 2029.

Quality of research outputs

A REF submission is not a count of everything we publish.

What makes a piece of research strong enough to submit?
How can we recognise 3\(^*\) and 4\(^*\) potential earlier?
How can we improve outputs while still being planned, written, or revised?
How can we discuss output quality across different areas of psychology?

We are using REF output criteria as a practical lens for strengthening research.

NTU Psychology in REF 2021

NTU submitted 58.0 FTE to UoA 4: Psychology, Psychiatry and Neuroscience.

75% of eligible staff were submitted.
145 outputs were submitted.
That is a broad research-active base.
The issue is not simply whether research is happening.
The strategic issue is the quality profile of the submitted outputs.

NTU Psychology Output Profile

Rating	Meaning	Percent	Approx. outputs
4*	World-leading	9.7%	14
3*	Internationally excellent	55.1%	80
2*	Internationally recognised	32.4%	47
1* / U/C	Nationally recognised / unclassified	2.8%	4

Most submitted outputs were 3\(^*\) or above. The main opportunity is increasing the proportion of outputs that can be defended as 4\(^*\).

Why REF Matters to NTU’s Research Capacity

REF matters because it shapes the conditions in which research happens.

It affects reputation and external confidence in our research strength.
It informs Quality-related Research funding allocated by Research England.
That funding goes to the university, not directly to individuals.
Stronger REF performance gives department a stronger case for investment (research time, PhD studentships, grant development, infrastructure)

The point is not only to perform better in REF. It is to use the criteria to help more work become genuinely strong research.

What is a Strong Output Worth?

Approximate annual quality-related research funding value of UoA4 outputs:

Output	Approx. annual QR funding value to NTU
2\(^*\) output	£0
3\(^*\) output	about £3k per year
4\(^*\) output	about £11k per year

Source: calculated from Research England’s 2025/26 NTU grant data tables, UoA4 outputs. Exact estimates: 3\(^*\) = £2,987; 4\(^*\) = £11,947.

The key point is per year: a 4\(^*\) output is not worth about £11k once. Across an eight-year REF cycle, that is roughly £90k per 4\(^*\) output.

Why Output Quality Is Distributed

Under REF 2021 rules:

a unit submitted 2.5 outputs per submitted FTE,
each submitted staff normally needed at least 1 attributed output,
no individual could normally have more than five outputs attributed to them.

So, for 10 submitted FTE, the unit needed about 25 outputs. Even if one person had many 4\(^*\) papers, only five could normally be attributed to them.

Strong REF performance depends on distributed output quality across the unit, not isolated excellence.

Why This Matters to You

REF-focused development helps with work you already care about.

Stronger papers are easier to position and defend.
Clearer contribution claims improve introductions and discussions.
Better planning can reduce late-stage rewriting.
Shared peer feedback can make outputs stronger before submission.
Calibration helps you decide where to invest time and ambition.

Join the Online Activity

Go to menti.com

Activity 1: What Counts as Strong Research?

Think about your own area of psychology.

What criteria do you use to identify a strong piece of research?
What makes you trust it?
What makes you think it matters?
What makes it more than just another competent study?

REF Output Criteria

REF outputs are judged against three criteria.

Originality: What is new, innovative, distinctive, or insight-generating?
Significance: Why does the contribution matter, and who could it influence?
Rigour: Is the work intellectually coherent, well executed, and appropriately evidenced?

How confident are you with REF output criteria?

Activity 2: Map Field Judgement to REF

Discuss on your tables:

Map the comments shown on menti.com to one or more REF output criteria:

Originality
Significance
Rigour

Where does it fit easily? Where does it not fit neatly?

Mapping Field Judgement to REF

What researchers often say	REF output criterion
“It adds something new.”	Originality
“It changes how we think about the problem.”	Originality / Significance
“The question matters.”	Significance
“People outside the immediate niche should care.”	Significance
“The design and analysis are convincing.”	Rigour
“The claims are supported by the evidence.”	Rigour

REF output criteria are not separate from good disciplinary judgement. They are markers of strong research that we already use, made explicit.

How Outputs Are Judged

REF Criteria Are Not Arbitrary

Used well, REF criteria can help us ask better research questions.

Criterion	Question for the author	Question for the assessor
Originality	What does this add that was not already clear?	Is the contribution genuinely distinctive?
Significance	Why does this matter beyond the immediate study?	Could this influence thinking, research, policy, or practice?
Rigour	Is the claim carried by the design, evidence, and analysis?	Can I trust the route from question to conclusion?

It’s not about compliance but improving research before the paper is finished.

Rating Translations

Rating	Working translation	Typical judgement
2*	Useful contribution	Solid work with international relevance, but limited ambition, novelty, significance, or evidential weight.
3*	Important contribution	A convincing output that moves an area forward substantially.
4*	World-leading contribution	An outstanding output with the potential to transform, redirect, or very considerably advance the field.

The 3\(^\) / 4\(^\) Borderline

Common questions:

Does this change how people in the field understand the problem?
Is the conceptual contribution unusually strong?
Is the evidence unusually persuasive, rich, complex, or cumulative?
Does the work open up a new line of research, method, theory, policy, or practice?

What 4\(^*\) Does Not Require

4\(^*\) does not require:

a particular journal,
a fashionable topic,
a huge sample in every case,
a specific method,
applied impact already achieved,
perfection in every sentence.

It does require a contribution that an informed assessor can recognise as outstanding.

Rating Calibration

What Abstracts Can and Cannot Tell Us

REF assessors judge the whole output, not just the abstract.

But a good abstract should make the REF-relevant case visible.

Abstracts are useful for first-pass calibration, but they are not enough for a final REF judgement, especially on rigour.

Activity 3: Abstract Evaluation

You will see four REF-submitted output examples.

Score all four abstracts individually on Menti for the visible evidence:

Originality: 1-5
Significance: 1-5
Rigour: 1-5

Use 1 = unclear evidence, 3 = plausible evidence, 5 = very strong evidence.

We will use the results to identify a case for discussion.

Examples are based on outputs submitted to REF 2021 UoA4. Summaries are paraphrased and deliberately vary in how strongly they signal the REF criteria.

Example A: Mental Health

Title: Clinical and economic outcomes of remotely delivered cognitive behaviour therapy versus treatment as usual for repeat unscheduled care users with severe health anxiety: a multi-centre randomised controlled trial

This multi-centre randomised trial tested whether remotely delivered CBT could help people with severe health anxiety who repeatedly used unscheduled healthcare. The study compared remote CBT with treatment as usual and assessed clinical outcomes, healthcare use, quality of life, and cost-effectiveness over follow-up. The intervention reduced health anxiety and related symptoms, with evidence of lower costs. The authors argue that the approach may offer a scalable way to engage a patient group that services often struggle to support.

REF 2021 UoA4 submitted output. DOI: 10.1186/s12916-019-1253-5. Summary paraphrased.

Example B: Forensic Psychology

Title: Advances in Facial Composite Technology, Utilizing Holistic Construction, Do Not Lead to an Increase in Eyewitness Misidentifications Compared to Older Feature-Based Systems

This experimental study examined whether constructing a facial composite affects later eyewitness identification. Participants viewed a staged crime and were assigned to construct a feature-based composite, construct a holistic composite, or complete no composite task. They later undertook an identification procedure after a delay intended to better resemble investigative conditions. The results suggested that neither composite method reduced later identification accuracy. The authors discuss implications for police use of facial-composite systems and eyewitness evidence.

REF 2021 UoA4 submitted output. DOI: 10.3389/fpsyg.2019.01962. Summary paraphrased.

Example C: Cognitive Psychology

Title: Cost-benefit trade-offs in decision-making and learning

This study asked whether the cognitive cost of managing conflict is part of value-based decision-making and learning. Participants completed a reversal-learning task with distracting flanker information, allowing comparison between free and instructed choices. Computational modelling tested how distractors, uncertainty, reward expectations, and conflict shaped decisions and learning rates. The results suggested that people sometimes follow irrelevant cues when cognitive control is costly, and that different forms of conflict can affect learning differently.

REF 2021 UoA4 submitted output. DOI: 10.1371/journal.pcbi.1007326. Summary paraphrased.

Example D: Autism / Developmental Psychology

Title: “Having All of Your Internal Resources Exhausted Beyond Measure and Being Left with No Clean-Up Crew”: Defining Autistic Burnout

This qualitative study used a community-based participatory approach to define autistic burnout, a concept widely discussed by autistic adults but underdeveloped in academic and clinical literature. The researchers analysed interviews and public internet sources, identifying recurring features including chronic exhaustion, reduced tolerance to stimulus, and loss of skills. They also identified contributors such as life stress, masking, and barriers to support. The paper argues that autistic burnout is distinct from occupational burnout and depression, with implications for recognition, support, stigma reduction, and suicide prevention.

REF 2021 UoA4 submitted output. DOI: 10.1089/aut.2019.0079. Summary paraphrased.

Activity 3: Debrief

We will discuss the abstract with the widest spread of scores.

Which example made the strongest case on each criterion?
Which criterion was easiest to judge from the abstract?
Which criterion was hardest to judge?
What wording made the contribution visible?
What would you need from the full paper for a stronger judgement?

The point is not to agree, the point is to make disagreements diagnostic.

Facilitator Calibration

Example	Originality	Significance	Rigour	Why this score?
A: Mental health	3	5	5	Strong applied significance and robust trial/economic design; originality depends on novelty of remote delivery and target group.
B: Forensic	3	3	4	Clear experimental test with practical relevance; contribution appears more bounded and incremental from the summary alone.
C: Cognitive	5	4	5	Strong conceptual integration and modelling; high rigour; significance mainly theoretical rather than directly applied.
D: Autism	5	5	3	Highly original construct-definition work with clear stakeholder relevance; rigour needs careful checking in the full qualitative methods.

Use this after discussion, not before scoring. The numbers are a proposed calibration for the summaries, not known REF scores for the outputs.

The aim is not to add REF language to papers

The aim is to build high-quality output thinking into:

the research question,
the design,
the evidence,
the collaboration,
the writing,
and the feedback process.

What support would help you produce stronger outputs? [menti.com]