Utilizing AIScreenR to select studies on Norwegian vaccination coverage

Jose F Meneses-Echavez

Prepared by: Jose F. Meneses-Echavez. Collaborators: Angela Susan Labberton, Hinta Meijerink. Seksjon for luft-, blod- og seksuell smitte. Avdeling for smittevern og vaksine. Folkehelseinstituttet. Oslo, Norway.
Contact: Jose Meneses-Echavez; jose.meneses@fhi.no.

Date: 20.05.2026.

Summary

This document outlines our approach for using AIScreenR to facilitate study selection at the title and abstract level for the literature search results. The entire team developed the screening prompt, while Jose Meneses prepared the necessary files for AIScreenR and executed the screening functions. The resulting outputs have been saved in the project repository, available at: https://doi.org/10.17605/OSF.IO/H2EQY.

The following is a step-by-step description of our workflow for using AIScreenR.

Loading libraries and packages

We first load all the necessary libraries and packages required to run AIScreenR, alongside the project data stored in our internal OneDrive folder. To ensure the dataset is imported correctly, we explicitly specify the file path.

Importing the dataset

## coverage_references <- read_refs("C:\\Users\\JOFM\\OneDrive - Folkehelseinstituttet\\Vaccine Surveillance - Literature search (Jose)\\Dekning_sok.Library.txt")

API Authentication

Users must encrypt their own OpenAI API key to access the language models. Further information on this process is available at: https://mikkelvembye.github.io/AIscreenR/

## get_api_key()

Prompting

The screening prompt was first drafted by Jose Meneses, incorporating feedback from the team, who then revised and approved the final text for consistency.

## prompt_coverage1 <-"Role: You are an expert epidemiological Researcher specializing in Systematic Reviews. Your task is to perform title and abstract screening with high precision. Task: Evaluate the provided study against the strict eligibility criteria below. You must follow a sequential Negative Screening (Exclusion) followed by Positive Screening (Inclusion) logic. Phase 1: Mandatory Exclusion Criteria - Immediately EXCLUDE the study if it meets ANY of the following: Wrong Focus: Evaluates interventions, clinical trials, public health campaigns, or policies designed to increase coverage (We only want observational/status reporting). Non-Human: The study is not conducted on human populations. Wrong Geography: Focuses on populations entirely outside the UK or Nordic countries (Denmark, Finland, Iceland, Norway, Sweden). Wrong Study Design: Literature, scoping, rapid, mapping, or systematic reviews; modelling studies; seroprevalence studies; lab studies; or vaccine effectiveness/efficacy studies. Wrong Publication Type: Conference abstracts, study protocols, editorials, or commentaries. Phase 2: Inclusion Criteria. If the study passed Phase 1, it must meet BOTH of the following to be INCLUDED: Core Topic: The study must explicitly report, analyze, or measure vaccination coverage/uptake for one or more of the following: Pertussis, Rotavirus, Diphtheria, Tetanus, Poliomyelitis, Hib, Hepatitis B, Pneumococcus, MMR (Measles, Mumps, Rubella), HPV, Tuberculosis, Influenza, Covid-19, Meningitis/Meningococcal, or RSV. Location: The data must explicitly cover the United Kingdom or at least one Nordic country (Denmark, Finland, Iceland, Norway, Sweden).Phase 3: Output Format. Provide your assessment using this exact structure: Decision: [INCLUDE or EXCLUDE] Triggered Criterion: [If excluded, name the specific rule (e.g., Wrong Study Design). If included, state Meets all inclusion criteria.] Rationale: [Provide a concise, evidence-based justification for your decision. If excluded, specify exactly which part of the abstract triggered the rule. If included, confirm the specific vaccine and geography identified.]"

Restricting the variables in the data

We restrict the dataset to keep only the variables required by AIScreenR. Because the tool, through the LLM, evaluates titles and abstracts to make inclusion decisions, identifying and addressing missing data is critical.

#coverage_references <- coverage_references %>%
  #select(
    #id,            
    #title,              
    #abstract        
  #)

Data cleaning

Checking and handling missing data

References retrieved from the Nasjonal Vitenarkiv often lacked titles and abstracts. To optimize tool performance, all “NULL” values were converted to “NA”.

# sum(coverage_references$abstract == "NULL", na.rm = TRUE) # abstracts missing = 690 
# coverage_references$abstract[coverage_references$abstract == "NULL" | is.na(coverage_references$abstract)] <- NA
# sum(coverage_references$abstract == "NA", na.rm = TRUE) 
# 
# coverage_references$title[coverage_references$title == "NULL" | is.na(coverage_references$title)] <- NA
# 
# sum(coverage_references$ID == "NULL", na.rm = TRUE) # ID missing = 1208 ref.
# coverage_references$ID[coverage_references$ID == "NULL" | is.na(coverage_references$ID)] <- NA # replaced with "NA"

Missing identification numbers were replaced with a sequential numbering system to prevent empty cells from interfering with the screening process.

For references with missing abstracts, we inserted the placeholder text: “No abstract available. Please decide based on title only.”

# coverage_references <- coverage_references %>%
#   # Convert ID to a random sequence
#   mutate(ID = sample(row_number())) %>%
#   # Force the variables needed for tabscreen_gpt
#   mutate(
#     title = as.character(unlist(title)),
#     abstract = as.character(unlist(abstract))
#   ) %>%
#   # Reeplacing NA/NULL in abstract with instruction/message
#   mutate(
#     abstract = if_else(is.na(abstract) | abstract == "NULL", 
#                        "No abstract available. Please decide based on title only.", 
#                        abstract)
#   ) %>%
#   # Selecting the variables for tabscreen_gpt
#   select(ID, title, abstract)
# 
# saveRDS(coverage_references, "coverage_references.rds") #saving the file

Running the screeening process

The cleaned dataset was then processed using tabscreen_gpt. We used GPT-5 mini (from OpenAI) from a local computer at the Norwegian Institute of Public Health on Tuesday, 12 May 2026.

# coverage_screening <-  tabscreen_gpt(
#   data = coverage_references,
#   prompt = prompt_coverage1,
#   studyid = ID,
#   title = title,
#   abstract = abstract,
#   model = "gpt-5-mini",
#   reps = 1,
#   decision_description = TRUE
# )

Analysis

AIScreenR included 453 references and excluded 1086 references based on the screening process.

# # Step 1. Count the number of included and excluded studies based on the AI screening results
# coverage_screening$answer_data%>%  
#   count(decision_binary)
# saveRDS(coverage_screening, "coverage_screening.rds")

Quality control

One researcher (Angela Susan Labberton) reviewed a random 10% sample of the 1086 excluded references to quality control that these were actually irrelevant references. A larger analysis was not possible due to time constraints. No relevant references were identified in the excluded references by AIScreenR.

References included by AIScreenR

Exporting the references included by AIScreenR.

# Extracting the references included by AIScreenR
# included_coverage <- coverage_screening$answer_data %>%
#   filter(decision_binary == 1)
# 
# view(included_coverage)
# colnames(included_coverage)
# 
# saveRDS(included_coverage, "included_coverage.rds") 
# writexl::write_xlsx(included_coverage, "included_coverage.xlsx")

Excluded by AIScreenR

Exporting the references excluded by AIScreenR.

# # Extracting the references EXCLUDED by AIScreenR
# excluded_coverage <- coverage_screening$answer_data %>%
#   filter(decision_binary == 0)
# 
# saveRDS(excluded_coverage, "excluded_coverage.rds") 
# writexl::write_xlsx(excluded_coverage, "excluded_coverage.xlsx")

Next steps

The 453 references included by AIScreenR are exported to an RIS file, which will be imported into Rayyan for formal screening.

# # Download a RIS file with the included studies for screening in Rayyan. #  # # Keep only the essential core columns: ID, studyid, title, abstract, detailed_description  # included_coverage_Rayyan <- included_coverage # saveRDS(included_coverage_Rayyan, "included_coverage_Rayyan.rds") #saving the object #  # write_refs(included_coverage_Rayyan, format = "ris", file = "included_coverage_Rayyan.ris") # saving the ris file #  # # saving the file as CSV # write.csv(included_coverage_Rayyan, "included_coverage_Rayyan.csv")

___________________________________________________________________________________