Jose F Meneses-Echavez
Prepared by: Jose F. Meneses-Echavez. Collaborators: Angela Susan
Labberton, Hinta Meijerink. Seksjon for luft-, blod- og seksuell smitte.
Avdeling for smittevern og vaksine. Folkehelseinstituttet. Oslo,
Norway.
Contact: Jose Meneses-Echavez; jose.meneses@fhi.no.
Date: 20.05.2026.
This document outlines our approach for using AIScreenR
to facilitate study selection at the title and abstract level for the
literature search results. The entire team developed the screening
prompt, while Jose Meneses prepared the necessary files for
AIScreenR and executed the screening functions. The
resulting outputs have been saved in the project repository, available
at: https://doi.org/10.17605/OSF.IO/H2EQY.
The following is a step-by-step description of our workflow for using
AIScreenR.
We first load all the necessary libraries and packages required to
run AIScreenR, alongside the project data stored in our
internal OneDrive folder. To ensure the dataset is imported correctly,
we explicitly specify the file path.
Importing the dataset
## coverage_references <- read_refs("C:\\Users\\JOFM\\OneDrive - Folkehelseinstituttet\\Vaccine Surveillance - Literature search (Jose)\\Dekning_sok.Library.txt")Users must encrypt their own OpenAI API key to access the language models. Further information on this process is available at: https://mikkelvembye.github.io/AIscreenR/
The screening prompt was first drafted by Jose Meneses, incorporating feedback from the team, who then revised and approved the final text for consistency.
## prompt_coverage1 <-"Role: You are an expert epidemiological Researcher specializing in Systematic Reviews. Your task is to perform title and abstract screening with high precision. Task: Evaluate the provided study against the strict eligibility criteria below. You must follow a sequential Negative Screening (Exclusion) followed by Positive Screening (Inclusion) logic. Phase 1: Mandatory Exclusion Criteria - Immediately EXCLUDE the study if it meets ANY of the following: Wrong Focus: Evaluates interventions, clinical trials, public health campaigns, or policies designed to increase coverage (We only want observational/status reporting). Non-Human: The study is not conducted on human populations. Wrong Geography: Focuses on populations entirely outside the UK or Nordic countries (Denmark, Finland, Iceland, Norway, Sweden). Wrong Study Design: Literature, scoping, rapid, mapping, or systematic reviews; modelling studies; seroprevalence studies; lab studies; or vaccine effectiveness/efficacy studies. Wrong Publication Type: Conference abstracts, study protocols, editorials, or commentaries. Phase 2: Inclusion Criteria. If the study passed Phase 1, it must meet BOTH of the following to be INCLUDED: Core Topic: The study must explicitly report, analyze, or measure vaccination coverage/uptake for one or more of the following: Pertussis, Rotavirus, Diphtheria, Tetanus, Poliomyelitis, Hib, Hepatitis B, Pneumococcus, MMR (Measles, Mumps, Rubella), HPV, Tuberculosis, Influenza, Covid-19, Meningitis/Meningococcal, or RSV. Location: The data must explicitly cover the United Kingdom or at least one Nordic country (Denmark, Finland, Iceland, Norway, Sweden).Phase 3: Output Format. Provide your assessment using this exact structure: Decision: [INCLUDE or EXCLUDE] Triggered Criterion: [If excluded, name the specific rule (e.g., Wrong Study Design). If included, state Meets all inclusion criteria.] Rationale: [Provide a concise, evidence-based justification for your decision. If excluded, specify exactly which part of the abstract triggered the rule. If included, confirm the specific vaccine and geography identified.]"We restrict the dataset to keep only the variables required by
AIScreenR. Because the tool, through the LLM, evaluates
titles and abstracts to make inclusion decisions, identifying and
addressing missing data is critical.
References retrieved from the Nasjonal Vitenarkiv often lacked titles and abstracts. To optimize tool performance, all “NULL” values were converted to “NA”.
# sum(coverage_references$abstract == "NULL", na.rm = TRUE) # abstracts missing = 690
# coverage_references$abstract[coverage_references$abstract == "NULL" | is.na(coverage_references$abstract)] <- NA
# sum(coverage_references$abstract == "NA", na.rm = TRUE)
#
# coverage_references$title[coverage_references$title == "NULL" | is.na(coverage_references$title)] <- NA
#
# sum(coverage_references$ID == "NULL", na.rm = TRUE) # ID missing = 1208 ref.
# coverage_references$ID[coverage_references$ID == "NULL" | is.na(coverage_references$ID)] <- NA # replaced with "NA"Missing identification numbers were replaced with a sequential numbering system to prevent empty cells from interfering with the screening process.
For references with missing abstracts, we inserted the placeholder text: “No abstract available. Please decide based on title only.”
# coverage_references <- coverage_references %>%
# # Convert ID to a random sequence
# mutate(ID = sample(row_number())) %>%
# # Force the variables needed for tabscreen_gpt
# mutate(
# title = as.character(unlist(title)),
# abstract = as.character(unlist(abstract))
# ) %>%
# # Reeplacing NA/NULL in abstract with instruction/message
# mutate(
# abstract = if_else(is.na(abstract) | abstract == "NULL",
# "No abstract available. Please decide based on title only.",
# abstract)
# ) %>%
# # Selecting the variables for tabscreen_gpt
# select(ID, title, abstract)
#
# saveRDS(coverage_references, "coverage_references.rds") #saving the fileThe cleaned dataset was then processed using
tabscreen_gpt. We used GPT-5 mini (from OpenAI) from a
local computer at the Norwegian Institute of Public Health on Tuesday,
12 May 2026.
# coverage_screening <- tabscreen_gpt(
# data = coverage_references,
# prompt = prompt_coverage1,
# studyid = ID,
# title = title,
# abstract = abstract,
# model = "gpt-5-mini",
# reps = 1,
# decision_description = TRUE
# )AIScreenR included 453 references and
excluded 1086 references based on the screening
process.
# # Step 1. Count the number of included and excluded studies based on the AI screening results
# coverage_screening$answer_data%>%
# count(decision_binary)
# saveRDS(coverage_screening, "coverage_screening.rds")One researcher (Angela Susan Labberton) reviewed a random 10% sample
of the 1086 excluded references to quality control that these were
actually irrelevant references. A larger analysis was not possible due
to time constraints. No relevant references were identified in the
excluded references by AIScreenR.
Exporting the references included by AIScreenR.
# Extracting the references included by AIScreenR
# included_coverage <- coverage_screening$answer_data %>%
# filter(decision_binary == 1)
#
# view(included_coverage)
# colnames(included_coverage)
#
# saveRDS(included_coverage, "included_coverage.rds")
# writexl::write_xlsx(included_coverage, "included_coverage.xlsx")Exporting the references excluded by AIScreenR.
# # Extracting the references EXCLUDED by AIScreenR
# excluded_coverage <- coverage_screening$answer_data %>%
# filter(decision_binary == 0)
#
# saveRDS(excluded_coverage, "excluded_coverage.rds")
# writexl::write_xlsx(excluded_coverage, "excluded_coverage.xlsx")The 453 references included by AIScreenR are exported to
an RIS file, which will be imported into Rayyan for formal
screening.
# # Download a RIS file with the included studies for screening in Rayyan. # # # Keep only the essential core columns: ID, studyid, title, abstract, detailed_description # included_coverage_Rayyan <- included_coverage # saveRDS(included_coverage_Rayyan, "included_coverage_Rayyan.rds") #saving the object # # write_refs(included_coverage_Rayyan, format = "ris", file = "included_coverage_Rayyan.ris") # saving the ris file # # # saving the file as CSV # write.csv(included_coverage_Rayyan, "included_coverage_Rayyan.csv")___________________________________________________________________________________