Library Event Attendance and Performance Analysis

Author

Library Analytics Team

Published

July 23, 2025

Executive Summary

This analysis examines library event attendance patterns and performance using comprehensive event data, focusing on actual registrations as the primary success metric. We explore how factors such as library card requirements, event recurrence, subjects, and timing influence registration rates across different time periods.

Data Loading and Preparation

Show code

# Load required libraries
library(tidyverse)
library(janitor)
library(lubridate)
library(scales)
library(plotly)
library(DT)
library(corrplot)
library(viridis)
library(patchwork)

# Load and prepare the data
event_data <- read_csv("event_data_with_analysis.csv") %>%
  clean_names() %>%
  mutate(
    # Parse event dates
    event_date = as.Date(ymd_hms(event_start_time)),
    event_year = year(event_date),
    event_month = month(event_date),
    event_day_of_week = wday(event_date, label = TRUE),
    
    # Create library card requirement indicator
    has_library_card = case_when(
      str_detect(library_card_requirement_status, "AHML Cardholders Only") ~ "Yes",
      TRUE ~ "No"
    ),
    
    # Create recurring event type
    recurring_type = case_when(
      str_detect(tolower(event_title), "weekly") ~ "Weekly",
      str_detect(tolower(event_title), "monthly") ~ "Monthly", 
      str_detect(tolower(event_title), "annual") ~ "Annual",
      str_detect(tolower(event_title), "semi-monthly") ~ "Semi-Monthly",
      TRUE ~ "One-Time"
    ),
    
    # Clean numeric columns
    expected_attendees = as.numeric(expected_attendees),
    actual_registrations = as.numeric(actual_registrations)
  ) %>%
  filter(!is.na(event_date)) # Remove rows with invalid dates

# Display data structure
glimpse(event_data)

Rows: 12,630
Columns: 40
$ event_id                        <dbl> 184282, 180161, 191922, 179904, 180460…
$ reservation_name                <chr> "1719168993 1731181732 1743885215 1743…
$ event_start_time                <dttm> 2025-07-02 23:30:00, 2025-07-02 20:00…
$ event_end_time                  <dttm> 2025-07-03 00:00:00, 2025-07-02 22:00…
$ event_date                      <date> 2025-07-02, 2025-07-02, 2025-07-02, 2…
$ event_year                      <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 20…
$ event_month                     <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
$ event_quarter                   <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
$ event_day_of_week               <ord> Wed, Wed, Wed, Wed, Wed, Wed, Wed, Wed…
$ event_day_of_week_number        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2,…
$ event_hour                      <dbl> 23, 20, 20, 18, 18, 18, 16, 15, 15, 15…
$ event_duration_minutes          <dbl> 30, 120, 120, 240, 120, 60, 540, 30, 6…
$ time_of_day_bucket              <chr> "Evening", "Early Evening", "Early Eve…
$ event_title                     <chr> "Wednesday Fun Night", "SVS Book Buddi…
$ event_description               <chr> "<p>Looking for something to do in the…
$ event_subjects                  <chr> "Storytime", "No Subjects Listed", "No…
$ event_room_name                 <chr> "Lindsey Room", "Lindsey Room", "Train…
$ room_type                       <chr> "staff_rooms", "staff_rooms", "staff_r…
$ room_status                     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ expected_attendees              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ attendee_count_bucket           <chr> "Small (0-25)", "Small (0-25)", "Small…
$ actual_registrations            <dbl> 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 5, 0…
$ actual_registration_bucket      <chr> "Small (0-25)", "Small (0-25)", "Small…
$ registration_types              <chr> "No Registration Type", "No Registrati…
$ library_card_requirement_raw    <chr> "0", "Not Specified", "0", "0", "0", "…
$ library_card_requirement_status <chr> "No Library Card Required", "Unknown",…
$ event_description_clean         <chr> "looking for something to do in the ev…
$ tavg_degrees_fahrenheit         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ tmax_degrees_fahrenheit         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ tmin_degrees_fahrenheit         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ prcp_inches                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ snow_inches                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ snwd_inches                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ macro_theme                     <chr> "Early Literacy / Youth Programming", …
$ micro_theme                     <chr> NA, NA, NA, NA, NA, "ESL - Adult", NA,…
$ temp_bucket                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ attendance_gap                  <dbl> 0, 0, 0, 0, 0, -10, 0, 0, 0, 0, 0, -5,…
$ gap_ratio                       <dbl> NA, NA, NA, NA, NA, Inf, NA, NA, NA, N…
$ has_library_card                <chr> "No", "No", "No", "No", "No", "No", "N…
$ recurring_type                  <chr> "One-Time", "One-Time", "One-Time", "O…

Show code

# Basic data summary
cat("Dataset Overview:\n")

Dataset Overview:

Show code

cat("Total Events:", nrow(event_data), "\n")

Total Events: 12630

Show code

cat("Date Range:", min(event_data$event_date, na.rm = TRUE), "to", max(event_data$event_date, na.rm = TRUE), "\n")

Date Range: 18628 to 20271

Show code

cat("Years Covered:", paste(sort(unique(event_data$event_year)), collapse = ", "), "\n")

Years Covered: 2021, 2022, 2023, 2024, 2025

Show code

cat("Unique Event Subjects:", length(unique(event_data$event_subjects[!is.na(event_data$event_subjects)])), "\n")

Unique Event Subjects: 144

Show code

cat("Library Card Required Events:", sum(event_data$has_library_card == "Yes", na.rm = TRUE), "\n")

Library Card Required Events: 2815

Show code

cat("Open Events:", sum(event_data$has_library_card == "No", na.rm = TRUE), "\n")

Open Events: 9815

Expected vs Actual Registrations Comparison

Show code

# Demonstrate the difference between expected and actual registrations
comparison_data <- event_data %>%
  filter(!is.na(expected_attendees), !is.na(actual_registrations)) %>%
  filter(expected_attendees <= 200, actual_registrations <= 200) # Remove outliers for clearer visualization

# Scatter plot comparison
comparison_data %>%
  ggplot(aes(x = expected_attendees, y = actual_registrations)) +
  geom_point(alpha = 0.6, color = "steelblue") +
  geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") +
  geom_abline(intercept = 0, slope = 1, color = "darkgreen", linetype = "solid", linewidth = 1) +
  labs(title = "Expected Attendees vs Actual Registrations",
       subtitle = "Red line = trend, Green line = perfect prediction",
       x = "Expected Attendees", 
       y = "Actual Registrations") +
  theme_minimal()

Show code

# Calculate correlation
correlation <- cor(comparison_data$expected_attendees, comparison_data$actual_registrations, use = "complete.obs")
cat("Correlation between Expected and Actual:", round(correlation, 3), "\n")

Correlation between Expected and Actual: 0.136

Show code

cat("Note: Analysis will focus on actual_registrations as the primary metric.\n")

Note: Analysis will focus on actual_registrations as the primary metric.

Global Analysis

Overall Performance Metrics

Show code

# Calculate overall metrics focusing on actual registrations
overall_metrics <- event_data %>%
  summarise(
    total_events = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    median_registrations = median(actual_registrations, na.rm = TRUE),
    events_with_data = sum(!is.na(actual_registrations))
  ) %>%
  mutate(across(where(is.numeric) & !c(total_events, total_registrations, events_with_data), round, 2))

knitr::kable(
  overall_metrics,
  caption = "Overall Event Performance Metrics",
  col.names = c("Total Events", "Total Registrations", "Avg Registrations", "Median Registrations", "Events with Data")
)

Overall Event Performance Metrics
Total Events	Total Registrations	Avg Registrations	Median Registrations	Events with Data
12630	156437	12.39	1	12630

Library Card Requirement Analysis

Show code

# Check if we have the necessary data
if(all(c("has_library_card", "actual_registrations") %in% names(event_data))) {
  
  # Comprehensive library card analysis
  card_metrics <- event_data %>%
    filter(!is.na(has_library_card), !is.na(actual_registrations)) %>%
    group_by(has_library_card) %>%
    summarise(
      event_count = n(),
      total_registrations = sum(actual_registrations, na.rm = TRUE),
      avg_registrations = mean(actual_registrations, na.rm = TRUE),
      median_registrations = median(actual_registrations, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    mutate(
      pct_events = round(100 * event_count / sum(event_count), 1),
      pct_registrations = round(100 * total_registrations / sum(total_registrations), 1)
    )
  
  knitr::kable(
    card_metrics,
    caption = "Event Performance by Library Card Requirement",
    col.names = c("Library Card Required", "Event Count", "Total Reg.", "Avg Reg.", "Median Reg.", "% Events", "% Registrations")
  )
  
  # Create charts for total registrations and event count
  p1 <- card_metrics %>%
    ggplot(aes(x = has_library_card, y = total_registrations, fill = has_library_card)) +
    geom_col(width = 0.6) +
    scale_fill_viridis_d(name = "Library Card\nRequired") +
    labs(title = "Total Registrations by Library Card Requirement",
         x = "Library Card Required", y = "Total Registrations") +
    theme_minimal() +
    geom_text(aes(label = comma(total_registrations)), vjust = -0.5, size = 4)
  
  p2 <- card_metrics %>%
    ggplot(aes(x = has_library_card, y = event_count, fill = has_library_card)) +
    geom_col(width = 0.6) +
    scale_fill_viridis_d(name = "Library Card\nRequired") +
    labs(title = "Number of Events by Library Card Requirement",
         x = "Library Card Required", y = "Number of Events") +
    theme_minimal() +
    geom_text(aes(label = comma(event_count)), vjust = -0.5, size = 4)
  
  # Use patchwork if available, otherwise print separately
  if(requireNamespace("patchwork", quietly = TRUE)) {
    print(p1 / p2)
  } else {
    print(p1)
    print(p2)
  }
}

Overall Attendance Analysis

Highest Attended Events

Show code

# Top 20 highest attended events
if("actual_registrations" %in% names(event_data)) {
  
  # Get available columns for the display
  display_cols <- c("event_title", "event_date", "actual_registrations")
  if("event_subjects" %in% names(event_data)) display_cols <- c(display_cols, "event_subjects")
  if("has_library_card" %in% names(event_data)) display_cols <- c(display_cols, "has_library_card")
  
  top_events <- event_data %>%
    filter(!is.na(actual_registrations)) %>%
    arrange(desc(actual_registrations)) %>%
    slice_head(n = 20) %>%
    select(all_of(display_cols)) %>%
    mutate(event_date = as.character(event_date))
  
  if(nrow(top_events) > 0) {
    DT::datatable(
      top_events,
      caption = "Top 20 Highest Attended Events",
      options = list(pageLength = 10, scrollX = TRUE)
    )
  }
}

Overall Attendance Statistics

Show code

# Detailed attendance statistics
if("actual_registrations" %in% names(event_data)) {
  
  attendance_stats <- event_data %>%
    filter(!is.na(actual_registrations)) %>%
    summarise(
      min_registrations = min(actual_registrations),
      q25_registrations = quantile(actual_registrations, 0.25),
      median_registrations = median(actual_registrations),
      q75_registrations = quantile(actual_registrations, 0.75),
      max_registrations = max(actual_registrations),
      mean_registrations = mean(actual_registrations),
      sd_registrations = sd(actual_registrations),
      events_over_50 = sum(actual_registrations > 50),
      events_over_100 = sum(actual_registrations > 100)
    ) %>%
    mutate(across(where(is.numeric) & !all_of(c("events_over_50", "events_over_100")), ~ round(.x, 2)))
  
  knitr::kable(
    attendance_stats,
    caption = "Registration Distribution Statistics",
    col.names = c("Min", "Q1", "Median", "Q3", "Max", "Mean", "Std Dev", ">50 Reg.", ">100 Reg.")
  )
  
  # Registration distribution histogram
  p <- event_data %>%
    filter(!is.na(actual_registrations), actual_registrations <= 150) %>%
    ggplot(aes(x = actual_registrations))
  
  # Add fill by library card if available
  if("has_library_card" %in% names(event_data)) {
    p <- p + geom_histogram(aes(fill = has_library_card), bins = 30, alpha = 0.7, position = "identity") +
      scale_fill_viridis_d(name = "Library Card\nRequired")
  } else {
    p <- p + geom_histogram(bins = 30, alpha = 0.7, fill = "steelblue")
  }
  
  p <- p + 
    labs(title = "Distribution of Actual Registrations",
         x = "Actual Registrations", y = "Count") +
    theme_minimal()
  
  print(p)
}

Subject and Theme Analysis

Event Subjects Performance

Show code

# Only proceed if event_subjects column exists
if("event_subjects" %in% names(event_data) && "actual_registrations" %in% names(event_data)) {
  
  # Comprehensive subject analysis
  subject_performance <- event_data %>%
    filter(!is.na(event_subjects), event_subjects != "", !is.na(actual_registrations)) %>%
    separate_rows(event_subjects, sep = ";") %>%
    mutate(event_subjects = str_trim(event_subjects)) %>%
    filter(event_subjects != "") %>%
    group_by(event_subjects) %>%
    summarise(
      event_count = n(),
      total_registrations = sum(actual_registrations, na.rm = TRUE),
      avg_registrations = mean(actual_registrations, na.rm = TRUE),
      median_registrations = median(actual_registrations, na.rm = TRUE),
      max_registrations = max(actual_registrations, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    arrange(desc(total_registrations)) %>%
    mutate(across(where(is.numeric) & !all_of(c("event_count", "total_registrations")), ~ round(.x, 2)))
  
  if(nrow(subject_performance) > 0) {
    # Top 15 subjects by total registrations
    top_subjects <- subject_performance %>% slice_head(n = 15)
    
    knitr::kable(
      top_subjects,
      caption = "Top 15 Event Subjects by Total Registrations",
      col.names = c("Subject", "Event Count", "Total Reg.", "Avg Reg.", "Median Reg.", "Max Reg.")
    )
    
    # Visualization: Top subjects by performance metrics
    p1 <- top_subjects %>%
      slice_head(n = 10) %>%
      ggplot(aes(x = reorder(event_subjects, total_registrations), y = total_registrations)) +
      geom_col(fill = "steelblue", alpha = 0.8) +
      coord_flip() +
      labs(title = "Top 10 Subjects by Total Registrations",
           x = "Event Subject", y = "Total Registrations") +
      theme_minimal() +
      theme(axis.text.y = element_text(size = 9))
    
    p2 <- top_subjects %>%
      slice_head(n = 10) %>%
      ggplot(aes(x = reorder(event_subjects, avg_registrations), y = avg_registrations)) +
      geom_col(fill = "darkgreen", alpha = 0.8) +
      coord_flip() +
      labs(title = "Top 10 Subjects by Average Registrations",
           x = "Event Subject", y = "Average Registrations") +
      theme_minimal() +
      theme(axis.text.y = element_text(size = 9))
    
    # Use patchwork if available
    if(requireNamespace("patchwork", quietly = TRUE)) {
      print(p1 / p2)
    } else {
      print(p1)
      print(p2)
    }
  }
} else {
  cat("Event subjects or actual registrations data not available for analysis.\n")
}

Year-by-Year Analysis

Annual Subject Performance

Show code

# Top 10 subjects per year analysis
yearly_top_subjects <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects != "") %>%
  group_by(event_year, event_subjects) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(event_year, desc(total_registrations))

# Create a summary table for each year's top 10
yearly_summary <- yearly_top_subjects %>%
  group_by(event_year) %>%
  slice_head(n = 10) %>%
  ungroup() %>%
  mutate(across(where(is.numeric) & !c(event_year, event_count, total_registrations), round, 2))

DT::datatable(
  yearly_summary,
  caption = "Top 10 Subjects by Total Registrations Per Year",
  colnames = c("Year", "Subject", "Event Count", "Total Reg.", "Avg Reg."),
  options = list(pageLength = 15, scrollX = TRUE),
  filter = 'top'
)

Average Registrations by Subject Per Year

Show code

# Heatmap of average registrations by top subjects and year
top_subjects_list <- head(subject_performance$event_subjects, 15)

heatmap_data <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% top_subjects_list) %>%
  group_by(event_year, event_subjects) %>%
  summarise(avg_registrations = mean(actual_registrations, na.rm = TRUE), .groups = "drop")

heatmap_data %>%
  ggplot(aes(x = factor(event_year), y = reorder(event_subjects, avg_registrations), fill = avg_registrations)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(name = "Avg Reg.") +
  labs(title = "Average Registrations Heatmap: Top 15 Subjects by Year",
       x = "Year", y = "Subject") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 9), axis.text.x = element_text(angle = 45, hjust = 1))

Total Registrations by Subject Per Year

Show code

# Stacked bar chart of total registrations by year for top subjects
yearly_totals <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% head(top_subjects_list, 10)) %>%
  group_by(event_year, event_subjects) %>%
  summarise(total_registrations = sum(actual_registrations, na.rm = TRUE), .groups = "drop")

yearly_totals %>%
  ggplot(aes(x = factor(event_year), y = total_registrations, fill = event_subjects)) +
  geom_col(position = "stack") +
  scale_fill_viridis_d(name = "Subject") +
  labs(title = "Total Registrations by Year: Top 10 Subjects (Stacked)",
       x = "Year", y = "Total Registrations") +
  theme_minimal() +
  theme(legend.position = "bottom")

Subject Deep Dive Across Years

All Subjects Trend Analysis

Show code

# Comprehensive trend analysis for all subjects with sufficient data
all_subject_trends <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_year), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects != "") %>%
  group_by(event_subjects, event_year) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Only include subjects that have data in multiple years
  group_by(event_subjects) %>%
  filter(n() > 1) %>%
  ungroup()

# Calculate growth rates and statistics for each subject
subject_growth_analysis <- all_subject_trends %>%
  group_by(event_subjects) %>%
  arrange(event_year) %>%
  summarise(
    years_active = n(),
    first_year = min(event_year),
    last_year = max(event_year),
    first_year_avg = first(avg_registrations),
    last_year_avg = last(avg_registrations),
    total_events = sum(event_count),
    total_reg_all_years = sum(total_registrations),
    .groups = "drop"
  ) %>%
  mutate(
    growth_rate = round(100 * (last_year_avg - first_year_avg) / first_year_avg, 2),
    avg_events_per_year = round(total_events / years_active, 1)
  ) %>%
  arrange(desc(total_reg_all_years))

knitr::kable(
  head(subject_growth_analysis, 20),
  caption = "Subject Performance Analysis Across Years (Top 20 by Total Registrations)",
  col.names = c("Subject", "Years Active", "First Year", "Last Year", "First Yr Avg", "Last Yr Avg", 
                "Total Events", "Total Reg.", "Growth Rate %", "Avg Events/Yr")
)

Subject Performance Analysis Across Years (Top 20 by Total Registrations)
Subject	Years Active	First Year	Last Year	First Yr Avg	Last Yr Avg	Total Events	Total Reg.	Growth Rate %	Avg Events/Yr
No Subjects Listed	5	2021	2025	12.185422	11.4461778	4577	46428	-6.07	915.4
Makerplace	5	2021	2025	19.764706	38.5466667	982	25965	95.03	196.4
ESL	5	2021	2025	7.288714	8.4212329	2198	18121	15.54	439.6
Technology Classes	5	2021	2025	14.327957	8.3761468	1547	11831	-41.54	309.4
Music	5	2021	2025	0.000000	162.3076923	73	11697	Inf	14.6
Movies	5	2021	2025	32.727273	36.3157895	126	8434	10.96	25.2
4	5	2021	2025	7.500000	32.6923077	145	7819	335.90	29.0
Books & Authors	5	2021	2025	15.453333	23.7045455	360	7507	53.39	72.0
Senior Center	5	2021	2025	6.862500	5.8915663	1317	6924	-14.15	263.4
3	5	2021	2025	20.296296	14.3750000	230	6556	-29.17	46.0
Business & Nonprofit	5	2021	2025	31.660377	34.7291667	302	6273	9.69	60.4
Jobs & Careers	5	2021	2025	124.900000	21.7435897	170	4124	-82.59	34.0
Civics & Voting	5	2021	2025	83.900000	157.5000000	31	3619	87.72	6.2
Genealogy	5	2021	2025	65.133333	4.4489796	311	3608	-93.17	62.2
One Book One Village	4	2021	2024	30.833333	60.8750000	87	3302	97.43	21.8
Exhibits	5	2021	2025	28.571429	109.3333333	43	2801	282.67	8.6
Storytime	5	2021	2025	9.094937	0.1318681	1246	2636	-98.55	249.2
Personal Finance	4	2022	2025	82.000000	24.5000000	31	1912	-70.12	7.8
Used Book Sales	5	2021	2025	55.735294	0.0000000	105	1895	-100.00	21.0
College & Careers	4	2021	2025	16.888889	42.5000000	24	526	151.64	6.0

Subject Trend Visualizations

Show code

# Create trend lines for top performing subjects
top_trending_subjects <- head(subject_growth_analysis$event_subjects, 8)

trend_data <- all_subject_trends %>%
  filter(event_subjects %in% top_trending_subjects)

# Multi-panel trend visualization
p1 <- trend_data %>%
  ggplot(aes(x = event_year, y = avg_registrations, color = event_subjects)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_color_viridis_d(name = "Subject") +
  labs(title = "Average Registrations Trends: Top 8 Subjects",
       x = "Year", y = "Average Registrations") +
  theme_minimal() +
  theme(legend.position = "bottom")

p2 <- trend_data %>%
  ggplot(aes(x = event_year, y = total_registrations, color = event_subjects)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_color_viridis_d(name = "Subject") +
  labs(title = "Total Registrations Trends: Top 8 Subjects",
       x = "Year", y = "Total Registrations") +
  theme_minimal() +
  theme(legend.position = "bottom")

p1 / p2

Year-over-Year Comparison Tables

Show code

# Create year-over-year comparison for top subjects
yoy_comparison <- all_subject_trends %>%
  filter(event_subjects %in% head(subject_growth_analysis$event_subjects, 10)) %>%
  select(event_subjects, event_year, avg_registrations) %>%
  pivot_wider(names_from = event_year, values_from = avg_registrations, names_prefix = "Year_") %>%
  arrange(desc(rowSums(select(., -event_subjects), na.rm = TRUE)))

# Replace NA with "-" for better display
yoy_comparison[is.na(yoy_comparison)] <- "-"

DT::datatable(
  yoy_comparison,
  caption = "Year-over-Year Average Registrations Comparison (Top 10 Subjects)",
  options = list(scrollX = TRUE, pageLength = 10)
)

Statistical Analysis and Seasonality

Show code

# Seasonality analysis by month for top subjects
monthly_patterns <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(event_month), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% head(subject_performance$event_subjects, 8)) %>%
  group_by(event_subjects, event_month) %>%
  summarise(
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    event_count = n(),
    .groups = "drop"
  )

# Monthly patterns heatmap
monthly_patterns %>%
  ggplot(aes(x = factor(event_month), y = reorder(event_subjects, avg_registrations), fill = avg_registrations)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(name = "Avg Reg.") +
  labs(title = "Monthly Registration Patterns: Top 8 Subjects",
       x = "Month", y = "Subject") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10))

Show code

# Growth rate distribution
subject_growth_analysis %>%
  filter(!is.infinite(growth_rate), !is.na(growth_rate)) %>%
  ggplot(aes(x = growth_rate)) +
  geom_histogram(bins = 20, fill = "steelblue", alpha = 0.7) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Distribution of Subject Growth Rates",
       subtitle = "Positive values indicate growth in average registrations",
       x = "Growth Rate (%)", y = "Number of Subjects") +
  theme_minimal()

Enhanced Analysis with Library Card Segmentation

Library Card Segmentation by Subject

Show code

# Subject performance by library card requirement
subject_card_analysis <- event_data %>%
  filter(!is.na(event_subjects), event_subjects != "", !is.na(has_library_card), !is.na(actual_registrations)) %>%
  separate_rows(event_subjects, sep = ";") %>%
  mutate(event_subjects = str_trim(event_subjects)) %>%
  filter(event_subjects %in% head(subject_performance$event_subjects, 10)) %>%
  group_by(event_subjects, has_library_card) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  )

# Stacked bar chart by subject and library card requirement
subject_card_analysis %>%
  ggplot(aes(x = reorder(event_subjects, total_registrations), y = total_registrations, fill = has_library_card)) +
  geom_col(position = "stack") +
  coord_flip() +
  scale_fill_viridis_d(name = "Library Card\nRequired") +
  labs(title = "Total Registrations by Subject and Library Card Requirement",
       x = "Subject", y = "Total Registrations") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 9))

Time-Based Patterns with Library Card Segmentation

Show code

# Day of week analysis with library card segmentation
dow_card_analysis <- event_data %>%
  filter(!is.na(event_day_of_week), !is.na(has_library_card), !is.na(actual_registrations)) %>%
  group_by(event_day_of_week, has_library_card) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  )

# Grouped bar chart
dow_card_analysis %>%
  ggplot(aes(x = reorder(event_day_of_week, total_registrations), y = avg_registrations, fill = has_library_card)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_fill_viridis_d(name = "Library Card\nRequired") +
  labs(title = "Average Registrations by Day of Week and Library Card Requirement",
       x = "Day of Week", y = "Average Registrations") +
  theme_minimal()

Show code

# Monthly patterns with library card segmentation
monthly_card_analysis <- event_data %>%
  filter(!is.na(event_month), !is.na(has_library_card), !is.na(actual_registrations)) %>%
  group_by(event_month, has_library_card) %>%
  summarise(
    event_count = n(),
    total_registrations = sum(actual_registrations, na.rm = TRUE),
    avg_registrations = mean(actual_registrations, na.rm = TRUE),
    .groups = "drop"
  )

monthly_card_analysis %>%
  ggplot(aes(x = factor(event_month), y = total_registrations, fill = has_library_card)) +
  geom_col(position = "dodge") +
  scale_fill_viridis_d(name = "Library Card\nRequired") +
  labs(title = "Total Registrations by Month and Library Card Requirement",
       x = "Month", y = "Total Registrations") +
  theme_minimal()

Summary of Key Insights

Show code

# Calculate key statistics for findings
total_events <- nrow(event_data)
total_registrations <- sum(event_data$actual_registrations, na.rm = TRUE)
most_popular_subject <- subject_performance$event_subjects[1]
best_avg_subject <- subject_performance$event_subjects[which.max(subject_performance$avg_registrations)]
card_required_pct <- round(100 * sum(event_data$has_library_card == "Yes", na.rm = TRUE) / 
                          sum(!is.na(event_data$has_library_card)), 1)

# Growth analysis
if(nrow(yearly_top_subjects) > 0) {
  years_span <- max(yearly_top_subjects$event_year) - min(yearly_top_subjects$event_year) + 1
} else {
  years_span <- length(unique(event_data$event_year))
}

cat("=== KEY FINDINGS ===\n\n")

=== KEY FINDINGS ===

Show code

cat("📊 Dataset Overview:\n")

📊 Dataset Overview:

Show code

cat("• Total Events:", comma(total_events), "\n")

• Total Events: 12,630

Show code

cat("• Total Registrations:", comma(total_registrations), "\n")

• Total Registrations: 156,437

Show code

cat("• Analysis Period:", years_span, "years\n")

• Analysis Period: 5 years

Show code

cat("• Library Card Required Events:", card_required_pct, "%\n\n")

• Library Card Required Events: 22.3 %

Show code

cat("🏆 Top Performers:\n")

🏆 Top Performers:

Show code

cat("• Highest Volume Subject:", most_popular_subject, "\n")

• Highest Volume Subject: No Subjects Listed

Show code

cat("• Best Average Performance:", best_avg_subject, "\n\n")

• Best Average Performance: Music

Show code

cat("📈 Key Patterns:\n")

📈 Key Patterns:

Show code

cat("• Expected vs Actual correlation:", round(correlation, 3), "\n")

• Expected vs Actual correlation: 0.136

Show code

cat("• Subjects with multi-year data:", nrow(subject_growth_analysis), "\n")

• Subjects with multi-year data: 23

Strategic Recommendations

📋 Programming Strategy: - Focus Investment: Prioritize resources on subjects showing consistent high performance and positive growth trends - Library Card Impact: Consider the registration patterns between card-required and open events when planning programming mix - Seasonal Optimization: Use monthly patterns to optimize event scheduling for maximum attendance - Subject Diversification: Balance high-performing subjects with emerging topics showing growth potential

📊 Resource Allocation: - Capacity Planning: Use historical registration data to better estimate event capacity needs - Staff Assignment: Allocate experienced staff to high-registration events and subjects - Budget Distribution: Weight budget allocation based on total registration volume and growth trends - Room Assignment: Match room capacity to expected attendance patterns by subject and day

📈 Performance Monitoring: - KPI Tracking: Focus on actual registrations as the primary success metric rather than expected attendees - Growth Tracking: Monitor year-over-year growth rates for early identification of trending subjects - Segmentation Analysis: Continue analyzing library card requirement impact on different subject areas - Seasonal Adjustments: Account for monthly variations when setting registration targets

🎯 Future Opportunities: - Subject Development: Investigate why certain subjects consistently outperform others - Accessibility Analysis: Examine whether library card requirements are limiting attendance for specific demographics - Cross-Subject Promotion: Leverage popular subjects to introduce attendees to emerging topic areas - Data-Driven Scheduling: Use day-of-week and time-of-day patterns to optimize event scheduling

This comprehensive analysis provides actionable insights for data-driven library event programming. The focus on actual registrations and library card segmentation reveals important patterns that can guide strategic decisions for maximizing community engagement and resource efficiency.