Collection Development Priorities

Print vs. Electronic Materials in U.S. Public Libraries

Tara Reynolds & Nicholas Meister

2025-11-12

Project Overview

Title: Collection Development Priorities: Print vs. Electronic Materials

Team:

Tara Reynolds (Data Visualization)
Nicholas Meister (Data Wrangling & Analysis)

Course: LIS 4210 Data Visualization, Fall 2025

Date: November 12, 2025

Dataset

Source: Institute of Museum and Library Services (IMLS)
Public Libraries Survey (PLS) FY2022

Size: 9,248 library systems across all 50 states + DC + territories
Variables: 192 fields including expenditures, collections, services, and demographics

Key Variables:

PRMATEXP - Print materials expenditure
ELMATEXP - Electronic materials expenditure
OTHMATEX - Other materials (audio, video, etc.)
POPU_LSA - Population of legal service area
LOCALE_ADD - Community type (urban/suburban/town/rural)
STABR - State abbreviation

Audience & Research Questions

Audience:

Collection development librarians making purchasing decisions
Library directors planning budgets and strategic priorities
State library agencies setting policy and allocating grants
Publishers and vendors understanding market trends
LIS educators preparing future librarians

Research Questions:

How do materials budget priorities vary across community types (urban vs. rural)?
Which states lead in electronic materials investment?
What is the relationship between library size, location, and digital collection strategy?

Data Wrangling: Loading & Initial Setup

Challenge 1: Loading 9,000+ observations with 192 variables

# Load essential packages
library(tidyverse)  # Data manipulation and visualization
library(janitor)    # Clean variable names
library(scales)     # Format numbers and currency

# Load data with strategic parameters
pls_2022_raw <- read_csv("data/PLS_FY22_AE_pud22i.csv", 
                         guess_max = 50000,      # Examine more rows
                         show_col_types = FALSE) # Suppress messages

# Standardize variable names
pls_2022_raw <- pls_2022_raw %>%
  clean_names()  # Converts to lowercase_with_underscores

Why guess_max = 50000?

By default, read_csv() only examines 1,000 rows to determine column types. With special codes throughout the dataset, we need it to look at far more rows for accurate type detection.

Data Wrangling: Handling Missing Data Codes

Challenge 2: IMLS uses negative codes instead of NA

-1 = Not applicable
-3 = Suppressed for confidentiality
-4 = Not available

pls_2022 <- pls_2022_raw %>%
  # Convert ALL negative values to proper NA across ALL numeric columns
  mutate(across(where(is.numeric), ~ ifelse(.x < 0, NA_real_, .x)))

Breaking down the code:

mutate() - Modify columns
across() - Apply function to multiple columns
where(is.numeric) - Select only numeric columns
~ ifelse(.x < 0, NA_real_, .x) - If negative, replace with NA; otherwise keep original

Result: One line cleans 100+ numeric variables systematically

Data Wrangling: Creating Calculated Variables

Challenge 3: Computing percentages and per-capita metrics

pls_2022 <- pls_2022 %>%
  # Filter to libraries with complete spending data
  filter(!is.na(prmatexp), !is.na(elmatexp)) %>%
  
  mutate(
    # Total materials expenditure (handle missing 'other' category)
    total_matexp = prmatexp + elmatexp + coalesce(othmatex, 0),
    
    # Calculate percentages
    pct_print = prmatexp / total_matexp * 100,
    pct_electronic = elmatexp / total_matexp * 100,
    pct_other = coalesce(othmatex, 0) / total_matexp * 100,
    
    # Per capita metrics with division-by-zero protection
    elec_per_capita = if_else(popu_lsa > 0, 
                               elmatexp / popu_lsa, 
                               NA_real_)
  )

Key functions:

coalesce(x, 0) - Use x if it exists; otherwise use 0
if_else() - Prevents division by zero errors

Data Wrangling: Locale Classification

Challenge 4: Variable names vary across survey years

# Flexible detection of locale variable
locale_col <- names(pls_2022_raw)[
  str_detect(names(pls_2022_raw), "locale")]

# Reclassify numeric codes into meaningful categories
pls_2022 <- pls_2022 %>%
  mutate(
    locale_label = case_when(
      .data[[locale_col[1]]] %in% c(11, 12) ~ "Urban/Suburban",
      .data[[locale_col[1]]] == 13 ~ "Town",
      .data[[locale_col[1]]] %in% c(41, 42, 43) ~ "Rural",
      TRUE ~ "Unknown"
    )
  ) %>%
  filter(locale_label != "Unknown", total_matexp > 0)

IMLS Locale Codes:

11, 12 = City, Suburban
13 = Town
41, 42, 43 = Rural (various densities)

Budget Priorities by Community Type

Key Insight: Urban/suburban libraries allocate significantly more to electronic resources compared to rural libraries, but all community types maintain majority print budgets.

Top States in Electronic Investment

Key Insight: Leading states show significantly higher electronic allocation, with geographic clustering suggesting regional consortia effects.

The Balance: Print vs. Electronic

Key Insight: Most libraries cluster below the diagonal (print-dominant), but large urban systems approach or exceed parity.

Per Capita Spending Patterns

Unexpected Finding: Town libraries show relatively high per-capita materials spending, possibly due to smaller populations with stable tax bases.

State-Level Comparison

Key Insight: Among the largest library systems, strategic approaches vary significantly, with electronic allocations ranging considerably across states.

The Story: Strategic Tensions in Collection Development

What We Found:

Universal Print Majority: Even leading states allocate 60-70% to print materials
Urban-Rural Divide: Urban libraries invest significantly more in electronic resources
Scale Matters: Large libraries can afford balanced portfolios; small libraries face either/or choices
State Variation: Significant differences in strategic priorities across states

So What?

For Librarians: Strategy must align with both user needs AND community type realities
For Administrators: Electronic transition requires sustained multi-year investment
For States: Geographic disparities suggest need for consortial purchasing power
For Vendors: Market segmentation by size and locale is essential
For Policy: Digital equity requires addressing content licensing costs, not just broadband

Thank You

Tara Reynolds & Nicholas Meister

LIS 4210 Data Visualization
Fall 2025