Library Event Attendance and Performance Analysis

Author

Library Analytics Team

Published

July 23, 2025

Executive Summary

This analysis examines library event attendance patterns and performance using comprehensive event data, focusing on actual registrations as the primary success metric. We explore how factors such as library card requirements, event recurrence, subjects, and timing influence registration rates across different time periods.

Data Loading and Preparation

Show code
# Load required libraries
library(tidyverse)
library(janitor)
library(lubridate)
library(scales)
library(plotly)
library(DT)
library(corrplot)
library(viridis)
library(patchwork)

# Load and prepare the data
event_data <- read_csv("event_data_with_analysis.csv") %>%
  clean_names() %>%
  mutate(
    # Parse event dates
    event_date = as.Date(ymd_hms(event_start_time)),
    event_year = year(event_date),
    event_month = month(event_date),
    event_day_of_week = wday(event_date, label = TRUE),
    
    # Create library card requirement indicator
    has_library_card = case_when(
      str_detect(library_card_requirement_status, "AHML Cardholders Only") ~ "Yes",
      TRUE ~ "No"
    ),
    
    # Create recurring event type
    recurring_type = case_when(
      str_detect(tolower(event_title), "weekly") ~ "Weekly",
      str_detect(tolower(event_title), "monthly") ~ "Monthly", 
      str_detect(tolower(event_title), "annual") ~ "Annual",
      str_detect(tolower(event_title), "semi-monthly") ~ "Semi-Monthly",
      TRUE ~ "One-Time"
    ),
    
    # Clean numeric columns
    expected_attendees = as.numeric(expected_attendees),
    actual_registrations = as.numeric(actual_registrations)
  ) %>%
  filter(!is.na(event_date)) # Remove rows with invalid dates

# Display data structure
glimpse(event_data)
Rows: 12,630
Columns: 40
$ event_id                        <dbl> 184282, 180161, 191922, 179904, 180460…
$ reservation_name                <chr> "1719168993 1731181732 1743885215 1743…
$ event_start_time                <dttm> 2025-07-02 23:30:00, 2025-07-02 20:00…
$ event_end_time                  <dttm> 2025-07-03 00:00:00, 2025-07-02 22:00…
$ event_date                      <date> 2025-07-02, 2025-07-02, 2025-07-02, 2…
$ event_year                      <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 20…
$ event_month                     <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
$ event_quarter                   <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
$ event_day_of_week               <ord> Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed…
$ event_day_of_week_number        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2,…
$ event_hour                      <dbl> 23, 20, 20, 18, 18, 18, 16, 15, 15, 15…
$ event_duration_minutes          <dbl> 30, 120, 120, 240, 120, 60, 540, 30, 6…
$ time_of_day_bucket              <chr> "Evening", "Early Evening", "Early Eve…
$ event_title                     <chr> "Wednesday Fun Night", "SVS Book Buddi…
$ event_description               <chr> "<p>Looking for something to do in the…
$ event_subjects                  <chr> "Storytime", "No Subjects Listed", "No…
$ event_room_name                 <chr> "Lindsey Room", "Lindsey Room", "Train…
$ room_type                       <chr> "staff_rooms", "staff_rooms", "staff_r…
$ room_status                     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ expected_attendees              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ attendee_count_bucket           <chr> "Small (0-25)", "Small (0-25)", "Small…
$ actual_registrations            <dbl> 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 5, 0…
$ actual_registration_bucket      <chr> "Small (0-25)", "Small (0-25)", "Small…
$ registration_types              <chr> "No Registration Type", "No Registrati…
$ library_card_requirement_raw    <chr> "0", "Not Specified", "0", "0", "0", "…
$ library_card_requirement_status <chr> "No Library Card Required", "Unknown",…
$ event_description_clean         <chr> "looking for something to do in the ev…
$ tavg_degrees_fahrenheit         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ tmax_degrees_fahrenheit         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ tmin_degrees_fahrenheit         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ prcp_inches                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ snow_inches                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ snwd_inches                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ macro_theme                     <chr> "Early Literacy / Youth Programming", …
$ micro_theme                     <chr> NA, NA, NA, NA, NA, "ESL - Adult", NA,…
$ temp_bucket                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ attendance_gap                  <dbl> 0, 0, 0, 0, 0, -10, 0, 0, 0, 0, 0, -5,…
$ gap_ratio                       <dbl> NA, NA, NA, NA, NA, Inf, NA, NA, NA, N…
$ has_library_card                <chr> "No", "No", "No", "No", "No", "No", "N…
$ recurring_type                  <chr> "One-Time", "One-Time", "One-Time", "O…
Show code
# Basic data summary
cat("Dataset Overview:\n")
Dataset Overview:
Show code
cat("Total Events:", nrow(event_data), "\n")
Total Events: 12630 
Show code
cat("Date Range:", min(event_data$event_date, na.rm = TRUE), "to", max(event_data$event_date, na.rm = TRUE), "\n")
Date Range: 18628 to 20271 
Show code
cat("Years Covered:", paste(sort(unique(event_data$event_year)), collapse = ", "), "\n")
Years Covered: 2021, 2022, 2023, 2024, 2025 
Show code
cat("Unique Event Subjects:", length(unique(event_data$event_subjects[!is.na(event_data$event_subjects)])), "\n")
Unique Event Subjects: 144 
Show code
cat("Library Card Required Events:", sum(event_data$has_library_card == "Yes", na.rm = TRUE), "\n")
Library Card Required Events: 2815 
Show code
cat("Open Events:", sum(event_data$has_library_card == "No", na.rm = TRUE), "\n")
Open Events: 9815 

Expected vs Actual Registrations Comparison

Show code
# Demonstrate the difference between expected and actual registrations
comparison_data <- event_data %>%
  filter(!is.na(expected_attendees), !is.na(actual_registrations)) %>%
  filter(expected_attendees <= 200, actual_registrations <= 200) # Remove outliers for clearer visualization

# Scatter plot comparison
comparison_data %>%
  ggplot(aes(x = expected_attendees, y = actual_registrations)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
  geom_abline(intercept = 0, slope = 1, color = "darkgreen", linetype = "solid", linewidth = 1) +
  labs(title = "Expected Attendees vs Actual Registrations",
       subtitle = "Red line = trend, Green line = perfect prediction",
       x = "Expected Attendees", 
       y = "Actual Registrations") +
  theme_minimal()

Show code
# Calculate correlation
correlation <- cor(comparison_data$expected_attendees, comparison_data$actual_registrations, use = "complete.obs")
cat("Correlation between Expected and Actual:", round(correlation, 3), "\n")
Correlation between Expected and Actual: 0.136 
Show code
cat("Note: Analysis will focus on actual_registrations as the primary metric.\n")
Note: Analysis will focus on actual_registrations as the primary metric.

Global Analysis

Overall Performance Metrics

Show code
# Calculate overall metrics focusing on actual registrations
overall_metrics <- event_data %>%
  summarise(
    total_events = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    median_registrations = median(actual_registrations, na.rm = TRUE),
    events_with_data = sum(!is.na(actual_registrations))
  ) %>%
  mutate(across(where(is.numeric) & !c(total_events, total_registrations, events_with_data), round, 2))

knitr::kable(
  overall_metrics,
  caption = "Overall Event Performance Metrics",
  col.names = c("Total Events", "Total Registrations", "Avg Registrations", "Median Registrations", "Events with Data")
)
Overall Event Performance Metrics
Total Events Total Registrations Avg Registrations Median Registrations Events with Data
12630 156437 12.39 1 12630

Library Card Requirement Analysis

Show code
# Check if we have the necessary data
if(all(c("has_library_card", "actual_registrations") %in% names(event_data))) {
  
  # Comprehensive library card analysis
  card_metrics <- event_data %>%
    filter(!is.na(has_library_card), !is.na(actual_registrations)) %>%
    group_by(has_library_card) %>%
    summarise(
      event_count = n(),
      total_registrations = sum(actual_registrations, na.rm = TRUE),
      avg_registrations = mean(actual_registrations, na.rm = TRUE),
      median_registrations = median(actual_registrations, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    mutate(
      pct_events = round(100 * event_count / sum(event_count), 1),
      pct_registrations = round(100 * total_registrations / sum(total_registrations), 1)
    )
  
  knitr::kable(
    card_metrics,
    caption = "Event Performance by Library Card Requirement",
    col.names = c("Library Card Required", "Event Count", "Total Reg.", "Avg Reg.", "Median Reg.", "% Events", "% Registrations")
  )
  
  # Create charts for total registrations and event count
  p1 <- card_metrics %>%
    ggplot(aes(x = has_library_card, y = total_registrations, fill = has_library_card)) +
    geom_col(width = 0.6) +
    scale_fill_viridis_d(name = "Library Card\nRequired") +
    labs(title = "Total Registrations by Library Card Requirement",
         x = "Library Card Required", y = "Total Registrations") +
    theme_minimal() +
    geom_text(aes(label = comma(total_registrations)), vjust = -0.5, size = 4)
  
  p2 <- card_metrics %>%
    ggplot(aes(x = has_library_card, y = event_count, fill = has_library_card)) +
    geom_col(width = 0.6) +
    scale_fill_viridis_d(name = "Library Card\nRequired") +
    labs(title = "Number of Events by Library Card Requirement",
         x = "Library Card Required", y = "Number of Events") +
    theme_minimal() +
    geom_text(aes(label = comma(event_count)), vjust = -0.5, size = 4)
  
  # Use patchwork if available, otherwise print separately
  if(requireNamespace("patchwork", quietly = TRUE)) {
    print(p1 / p2)
  } else {
    print(p1)
    print(p2)
  }
}

Overall Attendance Analysis

Highest Attended Events

Show code
# Top 20 highest attended events
if("actual_registrations" %in% names(event_data)) {
  
  # Get available columns for the display
  display_cols <- c("event_title", "event_date", "actual_registrations")
  if("event_subjects" %in% names(event_data)) display_cols <- c(display_cols, "event_subjects")
  if("has_library_card" %in% names(event_data)) display_cols <- c(display_cols, "has_library_card")
  
  top_events <- event_data %>%
    filter(!is.na(actual_registrations)) %>%
    arrange(desc(actual_registrations)) %>%
    slice_head(n = 20) %>%
    select(all_of(display_cols)) %>%
    mutate(event_date = as.character(event_date))
  
  if(nrow(top_events) > 0) {
    DT::datatable(
      top_events,
      caption = "Top 20 Highest Attended Events",
      options = list(pageLength = 10, scrollX = TRUE)
    )
  }
}

Overall Attendance Statistics

Show code
# Detailed attendance statistics
if("actual_registrations" %in% names(event_data)) {
  
  attendance_stats <- event_data %>%
    filter(!is.na(actual_registrations)) %>%
    summarise(
      min_registrations = min(actual_registrations),
      q25_registrations = quantile(actual_registrations, 0.25),
      median_registrations = median(actual_registrations),
      q75_registrations = quantile(actual_registrations, 0.75),
      max_registrations = max(actual_registrations),
      mean_registrations = mean(actual_registrations),
      sd_registrations = sd(actual_registrations),
      events_over_50 = sum(actual_registrations > 50),
      events_over_100 = sum(actual_registrations > 100)
    ) %>%
    mutate(across(where(is.numeric) & !all_of(c("events_over_50", "events_over_100")), ~ round(.x, 2)))
  
  knitr::kable(
    attendance_stats,
    caption = "Registration Distribution Statistics",
    col.names = c("Min", "Q1", "Median", "Q3", "Max", "Mean", "Std Dev", ">50 Reg.", ">100 Reg.")
  )
  
  # Registration distribution histogram
  p <- event_data %>%
    filter(!is.na(actual_registrations), actual_registrations <= 150) %>%
    ggplot(aes(x = actual_registrations))
  
  # Add fill by library card if available
  if("has_library_card" %in% names(event_data)) {
    p <- p + geom_histogram(aes(fill = has_library_card), bins = 30, alpha = 0.7, position = "identity") +
      scale_fill_viridis_d(name = "Library Card\nRequired")
  } else {
    p <- p + geom_histogram(bins = 30, alpha = 0.7, fill = "steelblue")
  }
  
  p <- p + 
    labs(title = "Distribution of Actual Registrations",
         x = "Actual Registrations", y = "Count") +
    theme_minimal()
  
  print(p)
}

Subject and Theme Analysis

Event Subjects Performance

Show code
# Only proceed if event_subjects column exists
if("event_subjects" %in% names(event_data) && "actual_registrations" %in% names(event_data)) {
  
  # Comprehensive subject analysis
  subject_performance <- event_data %>%
    filter(!is.na(event_subjects), event_subjects != "", !is.na(actual_registrations)) %>%
    separate_rows(event_subjects, sep = ";") %>%
    mutate(event_subjects = str_trim(event_subjects)) %>%
    filter(event_subjects != "") %>%
    group_by(event_subjects) %>%
    summarise(
      event_count = n(),
      total_registrations = sum(actual_registrations, na.rm = TRUE),
      avg_registrations = mean(actual_registrations, na.rm = TRUE),
      median_registrations = median(actual_registrations, na.rm = TRUE),
      max_registrations = max(actual_registrations, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(desc(total_registrations)) %>%
    mutate(across(where(is.numeric) & !all_of(c("event_count", "total_registrations")), ~ round(.x, 2)))
  
  if(nrow(subject_performance) > 0) {
    # Top 15 subjects by total registrations
    top_subjects <- subject_performance %>% slice_head(n = 15)
    
    knitr::kable(
      top_subjects,
      caption = "Top 15 Event Subjects by Total Registrations",
      col.names = c("Subject", "Event Count", "Total Reg.", "Avg Reg.", "Median Reg.", "Max Reg.")
    )
    
    # Visualization: Top subjects by performance metrics
    p1 <- top_subjects %>%
      slice_head(n = 10) %>%
      ggplot(aes(x = reorder(event_subjects, total_registrations), y = total_registrations)) +
      geom_col(fill = "steelblue", alpha = 0.8) +
      coord_flip() +
      labs(title = "Top 10 Subjects by Total Registrations",
           x = "Event Subject", y = "Total Registrations") +
      theme_minimal() +
      theme(axis.text.y = element_text(size = 9))
    
    p2 <- top_subjects %>%
      slice_head(n = 10) %>%
      ggplot(aes(x = reorder(event_subjects, avg_registrations), y = avg_registrations)) +
      geom_col(fill = "darkgreen", alpha = 0.8) +
      coord_flip() +
      labs(title = "Top 10 Subjects by Average Registrations",
           x = "Event Subject", y = "Average Registrations") +
      theme_minimal() +
      theme(axis.text.y = element_text(size = 9))
    
    # Use patchwork if available
    if(requireNamespace("patchwork", quietly = TRUE)) {
      print(p1 / p2)
    } else {
      print(p1)
      print(p2)
    }
  }
} else {
  cat("Event subjects or actual registrations data not available for analysis.\n")
}

Year-by-Year Analysis

Annual Subject Performance

Show code
# Top 10 subjects per year analysis
yearly_top_subjects <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects != "") %>%
  group_by(event_year, event_subjects) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(event_year, desc(total_registrations))

# Create a summary table for each year's top 10
yearly_summary <- yearly_top_subjects %>%
  group_by(event_year) %>%
  slice_head(n = 10) %>%
  ungroup() %>%
  mutate(across(where(is.numeric) & !c(event_year, event_count, total_registrations), round, 2))

DT::datatable(
  yearly_summary,
  caption = "Top 10 Subjects by Total Registrations Per Year",
  colnames = c("Year", "Subject", "Event Count", "Total Reg.", "Avg Reg."),
  options = list(pageLength = 15, scrollX = TRUE),
  filter = 'top'
)

Average Registrations by Subject Per Year

Show code
# Heatmap of average registrations by top subjects and year
top_subjects_list <- head(subject_performance$event_subjects, 15)

heatmap_data <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% top_subjects_list) %>%
  group_by(event_year, event_subjects) %>%
  summarise(avg_registrations = mean(actual_registrations, na.rm = TRUE), .groups = "drop")

heatmap_data %>%
  ggplot(aes(x = factor(event_year), y = reorder(event_subjects, avg_registrations), fill = avg_registrations)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(name = "Avg Reg.") +
  labs(title = "Average Registrations Heatmap: Top 15 Subjects by Year",
       x = "Year", y = "Subject") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 9), axis.text.x = element_text(angle = 45, hjust = 1))

Total Registrations by Subject Per Year

Show code
# Stacked bar chart of total registrations by year for top subjects
yearly_totals <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% head(top_subjects_list, 10)) %>%
  group_by(event_year, event_subjects) %>%
  summarise(total_registrations = sum(actual_registrations, na.rm = TRUE), .groups = "drop")

yearly_totals %>%
  ggplot(aes(x = factor(event_year), y = total_registrations, fill = event_subjects)) +
  geom_col(position = "stack") +
  scale_fill_viridis_d(name = "Subject") +
  labs(title = "Total Registrations by Year: Top 10 Subjects (Stacked)",
       x = "Year", y = "Total Registrations") +
  theme_minimal() +
  theme(legend.position = "bottom")

Subject Deep Dive Across Years

All Subjects Trend Analysis

Show code
# Comprehensive trend analysis for all subjects with sufficient data
all_subject_trends <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects != "") %>%
  group_by(event_subjects, event_year) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Only include subjects that have data in multiple years
  group_by(event_subjects) %>%
  filter(n() > 1) %>%
  ungroup()

# Calculate growth rates and statistics for each subject
subject_growth_analysis <- all_subject_trends %>%
  group_by(event_subjects) %>%
  arrange(event_year) %>%
  summarise(
    years_active = n(),
    first_year = min(event_year),
    last_year = max(event_year),
    first_year_avg = first(avg_registrations),
    last_year_avg = last(avg_registrations),
    total_events = sum(event_count),
    total_reg_all_years = sum(total_registrations),
    .groups = "drop"
  ) %>%
  mutate(
    growth_rate = round(100 * (last_year_avg - first_year_avg) / first_year_avg, 2),
    avg_events_per_year = round(total_events / years_active, 1)
  ) %>%
  arrange(desc(total_reg_all_years))

knitr::kable(
  head(subject_growth_analysis, 20),
  caption = "Subject Performance Analysis Across Years (Top 20 by Total Registrations)",
  col.names = c("Subject", "Years Active", "First Year", "Last Year", "First Yr Avg", "Last Yr Avg", 
                "Total Events", "Total Reg.", "Growth Rate %", "Avg Events/Yr")
)
Subject Performance Analysis Across Years (Top 20 by Total Registrations)
Subject Years Active First Year Last Year First Yr Avg Last Yr Avg Total Events Total Reg. Growth Rate % Avg Events/Yr
No Subjects Listed 5 2021 2025 12.185422 11.4461778 4577 46428 -6.07 915.4
Makerplace 5 2021 2025 19.764706 38.5466667 982 25965 95.03 196.4
ESL 5 2021 2025 7.288714 8.4212329 2198 18121 15.54 439.6
Technology Classes 5 2021 2025 14.327957 8.3761468 1547 11831 -41.54 309.4
Music 5 2021 2025 0.000000 162.3076923 73 11697 Inf 14.6
Movies 5 2021 2025 32.727273 36.3157895 126 8434 10.96 25.2
4 5 2021 2025 7.500000 32.6923077 145 7819 335.90 29.0
Books & Authors 5 2021 2025 15.453333 23.7045455 360 7507 53.39 72.0
Senior Center 5 2021 2025 6.862500 5.8915663 1317 6924 -14.15 263.4
3 5 2021 2025 20.296296 14.3750000 230 6556 -29.17 46.0
Business & Nonprofit 5 2021 2025 31.660377 34.7291667 302 6273 9.69 60.4
Jobs & Careers 5 2021 2025 124.900000 21.7435897 170 4124 -82.59 34.0
Civics & Voting 5 2021 2025 83.900000 157.5000000 31 3619 87.72 6.2
Genealogy 5 2021 2025 65.133333 4.4489796 311 3608 -93.17 62.2
One Book One Village 4 2021 2024 30.833333 60.8750000 87 3302 97.43 21.8
Exhibits 5 2021 2025 28.571429 109.3333333 43 2801 282.67 8.6
Storytime 5 2021 2025 9.094937 0.1318681 1246 2636 -98.55 249.2
Personal Finance 4 2022 2025 82.000000 24.5000000 31 1912 -70.12 7.8
Used Book Sales 5 2021 2025 55.735294 0.0000000 105 1895 -100.00 21.0
College & Careers 4 2021 2025 16.888889 42.5000000 24 526 151.64 6.0

Subject Trend Visualizations

Show code
# Create trend lines for top performing subjects
top_trending_subjects <- head(subject_growth_analysis$event_subjects, 8)

trend_data <- all_subject_trends %>%
  filter(event_subjects %in% top_trending_subjects)

# Multi-panel trend visualization
p1 <- trend_data %>%
  ggplot(aes(x = event_year, y = avg_registrations, color = event_subjects)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_color_viridis_d(name = "Subject") +
  labs(title = "Average Registrations Trends: Top 8 Subjects",
       x = "Year", y = "Average Registrations") +
  theme_minimal() +
  theme(legend.position = "bottom")

p2 <- trend_data %>%
  ggplot(aes(x = event_year, y = total_registrations, color = event_subjects)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_color_viridis_d(name = "Subject") +
  labs(title = "Total Registrations Trends: Top 8 Subjects",
       x = "Year", y = "Total Registrations") +
  theme_minimal() +
  theme(legend.position = "bottom")

p1 / p2

Year-over-Year Comparison Tables

Show code
# Create year-over-year comparison for top subjects
yoy_comparison <- all_subject_trends %>%
  filter(event_subjects %in% head(subject_growth_analysis$event_subjects, 10)) %>%
  select(event_subjects, event_year, avg_registrations) %>%
  pivot_wider(names_from = event_year, values_from = avg_registrations, names_prefix = "Year_") %>%
  arrange(desc(rowSums(select(., -event_subjects), na.rm = TRUE)))

# Replace NA with "-" for better display
yoy_comparison[is.na(yoy_comparison)] <- "-"

DT::datatable(
  yoy_comparison,
  caption = "Year-over-Year Average Registrations Comparison (Top 10 Subjects)",
  options = list(scrollX = TRUE, pageLength = 10)
)

Statistical Analysis and Seasonality

Show code
# Seasonality analysis by month for top subjects
monthly_patterns <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_month), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% head(subject_performance$event_subjects, 8)) %>%
  group_by(event_subjects, event_month) %>%
  summarise(
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    event_count = n(),
    .groups = "drop"
  )

# Monthly patterns heatmap
monthly_patterns %>%
  ggplot(aes(x = factor(event_month), y = reorder(event_subjects, avg_registrations), fill = avg_registrations)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(name = "Avg Reg.") +
  labs(title = "Monthly Registration Patterns: Top 8 Subjects",
       x = "Month", y = "Subject") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10))

Show code
# Growth rate distribution
subject_growth_analysis %>%
  filter(!is.infinite(growth_rate), !is.na(growth_rate)) %>%
  ggplot(aes(x = growth_rate)) +
  geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Distribution of Subject Growth Rates",
       subtitle = "Positive values indicate growth in average registrations",
       x = "Growth Rate (%)", y = "Number of Subjects") +
  theme_minimal()

Enhanced Analysis with Library Card Segmentation

Library Card Segmentation by Subject

Show code
# Subject performance by library card requirement
subject_card_analysis <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(has_library_card), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% head(subject_performance$event_subjects, 10)) %>%
  group_by(event_subjects, has_library_card) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  )

# Stacked bar chart by subject and library card requirement
subject_card_analysis %>%
  ggplot(aes(x = reorder(event_subjects, total_registrations), y = total_registrations, fill = has_library_card)) +
  geom_col(position = "stack") +
  coord_flip() +
  scale_fill_viridis_d(name = "Library Card\nRequired") +
  labs(title = "Total Registrations by Subject and Library Card Requirement",
       x = "Subject", y = "Total Registrations") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 9))

Time-Based Patterns with Library Card Segmentation

Show code
# Day of week analysis with library card segmentation
dow_card_analysis <- event_data %>%
  filter(!is.na(event_day_of_week), !is.na(has_library_card), !is.na(actual_registrations)) %>%
  group_by(event_day_of_week, has_library_card) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  )

# Grouped bar chart
dow_card_analysis %>%
  ggplot(aes(x = reorder(event_day_of_week, total_registrations), y = avg_registrations, fill = has_library_card)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_fill_viridis_d(name = "Library Card\nRequired") +
  labs(title = "Average Registrations by Day of Week and Library Card Requirement",
       x = "Day of Week", y = "Average Registrations") +
  theme_minimal()

Show code
# Monthly patterns with library card segmentation
monthly_card_analysis <- event_data %>%
  filter(!is.na(event_month), !is.na(has_library_card), !is.na(actual_registrations)) %>%
  group_by(event_month, has_library_card) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  )

monthly_card_analysis %>%
  ggplot(aes(x = factor(event_month), y = total_registrations, fill = has_library_card)) +
  geom_col(position = "dodge") +
  scale_fill_viridis_d(name = "Library Card\nRequired") +
  labs(title = "Total Registrations by Month and Library Card Requirement",
       x = "Month", y = "Total Registrations") +
  theme_minimal()

Summary of Key Insights

Show code
# Calculate key statistics for findings
total_events <- nrow(event_data)
total_registrations <- sum(event_data$actual_registrations, na.rm = TRUE)
most_popular_subject <- subject_performance$event_subjects[1]
best_avg_subject <- subject_performance$event_subjects[which.max(subject_performance$avg_registrations)]
card_required_pct <- round(100 * sum(event_data$has_library_card == "Yes", na.rm = TRUE) / 
                          sum(!is.na(event_data$has_library_card)), 1)

# Growth analysis
if(nrow(yearly_top_subjects) > 0) {
  years_span <- max(yearly_top_subjects$event_year) - min(yearly_top_subjects$event_year) + 1
} else {
  years_span <- length(unique(event_data$event_year))
}

cat("=== KEY FINDINGS ===\n\n")
=== KEY FINDINGS ===
Show code
cat("📊 Dataset Overview:\n")
📊 Dataset Overview:
Show code
cat("• Total Events:", comma(total_events), "\n")
• Total Events: 12,630 
Show code
cat("• Total Registrations:", comma(total_registrations), "\n")
• Total Registrations: 156,437 
Show code
cat("• Analysis Period:", years_span, "years\n")
• Analysis Period: 5 years
Show code
cat("• Library Card Required Events:", card_required_pct, "%\n\n")
• Library Card Required Events: 22.3 %
Show code
cat("🏆 Top Performers:\n")
🏆 Top Performers:
Show code
cat("• Highest Volume Subject:", most_popular_subject, "\n")
• Highest Volume Subject: No Subjects Listed 
Show code
cat("• Best Average Performance:", best_avg_subject, "\n\n")
• Best Average Performance: Music 
Show code
cat("📈 Key Patterns:\n")
📈 Key Patterns:
Show code
cat("• Expected vs Actual correlation:", round(correlation, 3), "\n")
• Expected vs Actual correlation: 0.136 
Show code
cat("• Subjects with multi-year data:", nrow(subject_growth_analysis), "\n")
• Subjects with multi-year data: 23 

Strategic Recommendations

📋 Programming Strategy: - Focus Investment: Prioritize resources on subjects showing consistent high performance and positive growth trends - Library Card Impact: Consider the registration patterns between card-required and open events when planning programming mix - Seasonal Optimization: Use monthly patterns to optimize event scheduling for maximum attendance - Subject Diversification: Balance high-performing subjects with emerging topics showing growth potential

📊 Resource Allocation: - Capacity Planning: Use historical registration data to better estimate event capacity needs - Staff Assignment: Allocate experienced staff to high-registration events and subjects - Budget Distribution: Weight budget allocation based on total registration volume and growth trends - Room Assignment: Match room capacity to expected attendance patterns by subject and day

📈 Performance Monitoring: - KPI Tracking: Focus on actual registrations as the primary success metric rather than expected attendees - Growth Tracking: Monitor year-over-year growth rates for early identification of trending subjects - Segmentation Analysis: Continue analyzing library card requirement impact on different subject areas - Seasonal Adjustments: Account for monthly variations when setting registration targets

🎯 Future Opportunities: - Subject Development: Investigate why certain subjects consistently outperform others - Accessibility Analysis: Examine whether library card requirements are limiting attendance for specific demographics - Cross-Subject Promotion: Leverage popular subjects to introduce attendees to emerging topic areas - Data-Driven Scheduling: Use day-of-week and time-of-day patterns to optimize event scheduling


This comprehensive analysis provides actionable insights for data-driven library event programming. The focus on actual registrations and library card segmentation reveals important patterns that can guide strategic decisions for maximizing community engagement and resource efficiency.