Iris Wu Final Project

Introduction

The Institute of Museum and Library Services (IMLS) is an independent federal agency that supports libraries and museums across the US. This past March, the current administration issued an executive order that virtually dismantled the IMLS. Since I have been following this story for a while, I focused my project on public libraries. I obtained my dataset from IMLS’s Public Libraries Survey (PLS) for fiscal year (FY) 2023, which spans from July 1, 2022 to June 30, 2023. Since 1988, the IMLS has conducted an annual survey of public library systems in the US. Approximately 9,000 public library systems are surveyed. To collect the data, data coordinators are appointed by each state’s or US territory’s library agency. These coordinators collect the data from their local libraries and report it to the IMLS via a web-based system.

First, I narrowed the dataset down to the variables I thought would be useful to examine. Due to space and visual considerations, I did not incorporate all of the variables that I selected. I used the associated Data Documentation and User Guide to understand the data definitions. Below, I have listed the definitions for the more obscure variables in the dataset, as well as how I used them.

Variable Name	Definition
C_FSCS	The C_FSCS stands for the Federal-State Cooperative System Public Library Definition. A library must meet the requirements under this definition to be considered a public library. I removed all the libraries in the dataset that did not meet this definition (i.e., they were marked as N/No)
POPU_UND	This is the unduplicated population of the library’s service area. Unduplicated means that each person is only counted once.
LOCGVT, STGVT, FEDGVT TOTINCM	LOCGVT, STGVT, and FEDGVT represent operational revenue/funding from the local, state, and federal governments, respectively. Operational revenue covers staff salaries, benefits, and other daily administrative needs. TOTINCM is total operational revenue, the sum of these three variables.
LCAP_REV, SCAP_REV, FCAP_REV CAP_REV	The first three variables represent capital revenue/funding from the local, state, and federal governments. Capital revenue include major investments, such as renovations or a new building. CAP_REV is the total capital revenue. I combined TOTINCM and CAP_REV to find the total funding for each library (TOTFUN).
TOTOPEXP, CAPITAL	TOTOPEXP stands for total operational expenditures. CAPITAL stands for total capital expenditures. I combined these two variables to find the total expenditure for each library (TOTEXP). I then subtracted TOTEXP from TOTFUN to find the budget deficit for each library.
BKVOL, TOTPHYS	BKVOL is the total number of books and print material in a library’s collection. TOTPHYS is the total number of physical items (including books, videos, audiobooks, etc.) in a library’s collection.
HRS_OPEN, WIFISESS	HRS_OPEN is the total number of hours a library was open in FY2023. WIFISESS is the total number of WiFi sessions a library provided in FY2023. I did not use these variables because I thought adding them to the map plot might lead to data overload in the tooltips.
VISITS, TOTCIR	VISITS is the total number of visits a library received in fiscal year 2023. TOTCIR is the total number of circulation transactions that a library processed. These transactions can include people borrowing or returning books and other items from the library.
TOTPRO, TOTATTEN	TOTPRO is the total number of synchronous programs that a library hosted. TOTATTEN counts the total attendance at these programs.
LOCALE_ADD	The PLS sorts library systems into four broad geographic categories – city, suburban, town, and rural – according to where their administrative HQ and branch libraries are located. There are multiple subcategories for each broad category, such as “small”, “mid-size”, “remote”, and “fringe”. Each category-subcategory combination has a corresponding number code. For the sake of simplicity, I stuck only with the four broad categories, and I converted the codes into text.

To clean the data, I removed all instances of missing data (represented by -1 in the dataset), as well as libraries that were closed (represented by -3 in the dataset). For better readability, I converted the library and city names to title case.

Public libraries are not just places to borrow books; they also provide many beneficial services, like educational programs, language classes, training for jobseekers, and free internet. Moreover, they are one of the few free quiet spaces in public. Libraries are an important part of my life – as a kid, I spent almost every weekend at my local library branch, and I still borrow most of my books from the Montgomery Country Public Library system. However, public libraries across the country are struggling with funding cuts, rising costs, and dwindling political support. The March executive order that dismantled the IMLS has only exacerbated the situation. While The New York Times reported that a court ruling earlier this month has restored thousands of cancelled IMLS grants, it will be difficult to undo the damage. According to another New York Times article, some libraries are on the verge of closing because they cannot secure sufficient funding. Rural libraries are most at risk. While the PLS 2023 data may not reflect the impact of these recent cuts, I wanted to focus my visualizations on funding and locale. My goal was to determine the relationship between the locale of a library system and its total funding and whether other variables could predict funding. I also wanted to provide an overview of rural public libraries in the US, since they currently need the most help.

Load dataset and libraries

library(tidyverse)
library(ggfortify)
library(highcharter)
library(leaflet)
library(knitr)
library(webshot2)
setwd("C:/Users/rsaidi/Downloads")
#setwd("C:/Users/iwu80/OneDrive/Documents/Files/School/DATA 110 R Assignments")
libraries <- read_csv("PLS Libraries FY23.csv")

Clean the dataset

#select relevant columns 
libs <- libraries |>
  select(STABR, LIBNAME, CITY, C_FSCS, POPU_UND, LOCGVT, STGVT, FEDGVT, TOTINCM, TOTOPEXP, LCAP_REV, SCAP_REV, FCAP_REV, CAP_REV, CAPITAL, BKVOL, TOTPHYS, HRS_OPEN, VISITS, TOTPRO, TOTCIR, TOTATTEN, WIFISESS, LATITUDE, LONGITUD, LOCALE_ADD)
#filter the data
libs2 <- libs |>
  filter(C_FSCS == "Y") # <- remove any libraries that do not meet the FSCS Public Library Definition
#more filtering 
libs_clean <- libs2 |>
  # -1 represents missing data. remove all instances of missing data. 
  filter(TOTATTEN != -1 & WIFISESS != -1 & TOTOPEXP != -1 & CAPITAL != -1 & HRS_OPEN != -1 & TOTPRO != -1 & BKVOL != -1) |> 
  #-3 indicates the library is closed or temporarily closed. remove the three closed libraries in the dataset
  filter(LIBNAME != "ANIAK PUBLIC LIBRARY" & LIBNAME != "MAGNOLIA SPRINGS PUBLIC LIBRARY" & LIBNAME != "PUEBLO DE COCHITI LIBRARY") |>
  #remove the C_FSCS column since it is not longer necessary 
  select(-C_FSCS) |>
  #rename the "longitude" column because it is misspelled 
  rename("LONGITUDE" = "LONGITUD")

#create three new variables, "Total Funding" (TOTFUN), "Total Expenditures" (TOTEXP), and "Budget Deficit" (DEF)
libs3 <- libs_clean |>
  #total funding is the sum of operating revenue and capital revenue 
  mutate(TOTFUN = TOTINCM + CAP_REV) |> 
  #total expenditures is the sum of operating expenditures and capital expenditures
  mutate(TOTEXP = TOTOPEXP + CAPITAL) |>
  #subtract total funding from total expenditures to get the deficit 
  mutate(DEF = TOTFUN - TOTEXP)

#convert the library names to title case; I used this site for help with the code: https://stringr.tidyverse.org/reference/case.html
libs3$LIBNAME <-str_to_title(libs3$LIBNAME)
#convert the city names to title case
libs3$CITY <- str_to_title(libs3$CITY)

Multiple Linear Regression Analysis

I wanted to find which variables strongly correlate with the amount of total funding a public library receives. Do busier libraries or larger libraries (i.e., libraries with more visits, more program attendance, and larger collections) receive more funding?

I used the backwards elimination process to determine whether the following variables have a strong correlation with total funding:

POPU_UND	Total population of a library’s service area
TOTPHYS	Total physical items in collection
VISITS	Total number of library visits
TOTCIR	Total annual circulation transactions
TOTATTEN	Total attendance at library programs

After running the first model, I removed the variable TOTATTEN because it had the largest p-value (.451). The remaining variables all had p-values smaller than 2e-16, making them statistically significant. Removing TOTATTEN only increased the adjusted R-squared very slightly, from .8995 to .8996. However, its large p-value indicates that attendance at library programs is not statistically meaningful enough to predict total funding. Moreover, since it had almost no effect on the adjusted r-squared, it is likely multi-collinear with another variable (perhaps VISITS) in the dataset and therefore redundant.

The residuals vs fitted plot for the second model, with TOTATTEN removed, showed that three libraries in California — 373 (LA County Library), 409 (SF Public Library), and 424 (South SF Library) — were outliers. I removed these outliers from the data to increase the adjusted r-squared.

#this is my final model  
options(scipen = 0)
libs4 <- libs3[-c(373, 409, 424),] #remove the outliers
fit_best <- lm(libs4$TOTFUN ~ libs4$POPU_UND + libs4$TOTPHYS + libs4$VISITS + libs4$TOTCIR, data = libs4)
summary(fit_best)

## 
## Call:
## lm(formula = libs4$TOTFUN ~ libs4$POPU_UND + libs4$TOTPHYS + 
##     libs4$VISITS + libs4$TOTCIR, data = libs4)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -39601529   -250061    159220    303126  45829237 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -3.951e+05  3.145e+04  -12.56   <2e-16 ***
## libs4$POPU_UND  1.012e+01  4.553e-01   22.22   <2e-16 ***
## libs4$TOTPHYS   8.947e+00  1.254e-01   71.33   <2e-16 ***
## libs4$VISITS    1.139e+01  3.017e-01   37.76   <2e-16 ***
## libs4$TOTCIR    1.208e+00  7.487e-02   16.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2562000 on 7218 degrees of freedom
## Multiple R-squared:  0.927,  Adjusted R-squared:  0.927 
## F-statistic: 2.292e+04 on 4 and 7218 DF,  p-value: < 2.2e-16

autoplot(fit_best)

## Warning: `fortify(<lm>)` was deprecated in ggplot2 3.6.0.
## ℹ Please use `broom::augment(<lm>)` instead.
## ℹ The deprecated feature was likely used in the ggfortify package.
##   Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggfortify package.
##   Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggfortify package.
##   Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The equation for the model is:

(Predicted) Total Funding = 10.117(POPU_UND) + 8.947(TOTPHYS) + 11.391(VISITS) + 1.208( TOTCIR) + -395095.4

According to the adjusted R-squared approximately 92.7% of the variance in the data can be explained by the model. This model is relatively strong. The coefficients for each variable can be interpreted as follows:

10.117(POPU_UND)	For every one-person increase in the population of a library’s service area, total funding for the library increases by $10.117, if all other predictors are held constant.
8.947(TOTPHYS)	For every one-item increase in a library’s physical collection, total funding increases by $8.947, if all other predictors are held constant.
11.391(VISITS)	For every additional visit to a library, total funding increases by $11.391, if all other predictors are held constant.
1.208(TOTCIR)	For every additional circulation transaction, total funding increases by $1.208, if all other predictors are held constant.

I did not expect some of the predictors in my model to correlate so strongly with total funding. According to WebJunction, an educational website for library staff, demand or busyness usually does not determine public library funding in the US. Surprisingly, my model shows that an increase in library visits correlates with the largest increase in total funding. When contextualized with other variables that can determine library funding, like property taxes or government grants, visits may not matter much. However, a library system that receives a high volume of visits indicates that it meets a demand in the community (or at least is well-appreciated), which in turn leads the community to prioritize funding for it.

I was less surprised by the other predictors. Library systems in highly populated areas are more likely to be well-funded because they have larger and more diverse revenue streams. Similarly, a library system that can afford a large collection is likely well-funded. Total circulation has the smallest impact on total funding. This is unsurprising because a circulation transaction (such as borrowing or returning books) can be made multiple times by the same person, so it is not the most precise indicator of how in demand a library system is.

Data Visualizations

Prepare the data for Plot 1

#The numbers in the LOCALE_ADD column each represent a geographic category (suburb, town, city, etc.) To sort the data into four categories - rural, town, suburban, city - first convert the numbers to text. 
#replace numbers 11, 12, 13 with "City"
libs3$LOCALE_ADD <- gsub("11", "City", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("12", "City", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("13", "City", libs3$LOCALE_ADD)
#replace numbers 21, 22, 23 with "Suburban"
libs3$LOCALE_ADD <- gsub("21", "Suburban", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("22", "Suburban", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("23", "Suburban", libs3$LOCALE_ADD)
#replace numbers 31, 32, 33 with "Town" 
libs3$LOCALE_ADD <- gsub("31", "Town", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("32", "Town", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("33", "Town", libs3$LOCALE_ADD)
#replace numbers 41, 42, 43 with "Rural" 
libs3$LOCALE_ADD <- gsub("41", "Rural", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("42", "Rural", libs3$LOCALE_ADD)
libs3$LOCALE_ADD <- gsub("43", "Rural", libs3$LOCALE_ADD)

#find the respective totals for federal, state, and local government funding by adding up operating revenue and capital revenue for each level. 
libs_tot <- libs3 |>
  #total funding from local govt
  mutate(TOTLOC = LOCGVT + LCAP_REV) |>
  #total funding from state govt 
  mutate(TOTSTA = STGVT + SCAP_REV) |>
  #total funding from fed govt 
  mutate(TOTFED = FEDGVT + FCAP_REV)

#separate into four datasets based on locale category 
#city
libs_city <- libs_tot |>
  filter(LOCALE_ADD == "City")
#suburb
libs_sub <- libs_tot|>
  filter(LOCALE_ADD == "Suburban")
#town 
libs_town <- libs_tot |>
  filter(LOCALE_ADD == "Town") 
#rural 
libs_ru <- libs_tot |>
  filter(LOCALE_ADD == "Rural")

#for each dataset, find the averages for each level of government funding
city_avgs <- libs_city |>
  group_by(LOCALE_ADD) |>
  summarize(avgtotloc = mean(TOTLOC), avgtotsta = mean(TOTSTA), avgtotfed = mean(TOTFED), .groups = "drop") 
#suburb 
sub_avgs <- libs_sub |>
  group_by(LOCALE_ADD) |>
  summarize(avgtotloc = mean(TOTLOC), avgtotsta = mean(TOTSTA), avgtotfed = mean(TOTFED), .groups = "drop")
#town 
town_avgs <- libs_town |>
  group_by(LOCALE_ADD) |>
  summarize(avgtotloc = mean(TOTLOC), avgtotsta = mean(TOTSTA), avgtotfed = mean(TOTFED), .groups = "drop")
#rural 
rur_avgs <- libs_ru |>
   group_by(LOCALE_ADD) |>
  summarize(avgtotloc = mean(TOTLOC), avgtotsta = mean(TOTSTA), avgtotfed = mean(TOTFED), .groups = "drop")
#join the datasets 
libfunds <- bind_rows(city_avgs, sub_avgs, town_avgs, rur_avgs)
head(libfunds)

## # A tibble: 4 × 4
##   LOCALE_ADD avgtotloc avgtotsta avgtotfed
##   <chr>          <dbl>     <dbl>     <dbl>
## 1 City       13118991.  1224622.   179765.
## 2 Suburban    2642601.   242121.    15224.
## 3 Town         674205.   120579.    10751.
## 4 Rural        226882.    29580.     2485.

#pivot longer so data is easier to plot 
fundslibs <- libfunds |>
  pivot_longer(cols = 2:4, names_to = " Funding Level", values_to = "Avg Funding" )
#rename the funding levels 
fundslibs$` Funding Level` <- gsub("avgtotloc", "Local", fundslibs$` Funding Level`)
fundslibs$` Funding Level`<- gsub("avgtotsta", "State", fundslibs$` Funding Level`)
fundslibs$` Funding Level`<- gsub("avgtotfed", "Federal", fundslibs$` Funding Level`)
head(fundslibs)

## # A tibble: 6 × 3
##   LOCALE_ADD ` Funding Level` `Avg Funding`
##   <chr>      <chr>                    <dbl>
## 1 City       Local                13118991.
## 2 City       State                 1224622.
## 3 City       Federal                179765.
## 4 Suburban   Local                 2642601.
## 5 Suburban   State                  242121.
## 6 Suburban   Federal                 15224.

Plot 1

#set the palette (I used https://www.color-hex.com/ to find color hex codes)
colors <- c("#0294b8", "#8d9bc4", "#cac371")
#plot the data 
highchart() |>
  hc_add_series(data = fundslibs, type = "column", hcaes(x = LOCALE_ADD, y = fundslibs$`Avg Funding`, group = fundslibs$` Funding Level` )) |>
  hc_colors(colors) |>
  hc_title(text = "Funding for US Public Libraries, Fiscal Year 2023", style = list(fontweight = "bold"), align = "center") |>
  hc_xAxis(type = "category", title = list(text = "Geographic Area of Library System")) |>
  hc_yAxis(title = list(text = "Average Total Funding, $")) |>
  hc_subtitle(text = "Source: Institute of Museum and Library Services", align = "center") |>
  hc_tooltip(borderColor = "black", pointFormat = "{series.name}: ${point.Avg Funding:.2f}") |>
  hc_chart(style = list(fontFamily = "serif")) |>
  hc_add_theme(hc_theme_google())

My first plot shows how much funding, on average, that public library systems in different geographic areas received from each level of government in fiscal year 2023. Across all geographic areas, local governments provided the most funding. Further, funding decreased across all government levels as the remoteness of the geographic area increased. Libraries in rural areas, which are located farthest from an urban cluster, received the least funding, on average.

Map

#set the latitude and longitude for the continental US
US_lat <- 38.7946
US_long <- -106.5348

#create a popup 
popuplibs <- paste0("<b>Library System: </b>", libs_ru$LIBNAME, "<br>", "<b>City: </b>", libs_ru$CITY, "<br>", "<b>State: </b>", libs_ru$STABR, "<br>", "<b> Service Area Population: </b>", libs_ru$POPU_UND, "<br>", "<b>Total Annual Library Visits: </b>", libs_ru$VISITS, "<br>", "<b>Total Books: </b>", libs_ru$BKVOL, "<br>", "<b>Total Annual Library Programs: </b>", libs_ru$TOTPRO, "<br>", "<b>FY2023 Total Funding: </b>", libs_ru$TOTFUN, "<br>", "<b>FY2023 Total Expenditures: </b>",libs_ru$TOTEXP, "<br>", "<b>FY2023 Budget Deficit or Surplus: </b>", libs_ru$DEF, "<br>")

#set the legend palette 
mapcols <- colorNumeric(palette = c( "#aada8e", "#7ba967", "#6aa84f", "#4c8533","#406c2e","#4b634b","#3d5731", "#243b2c" ), domain = libs_ru$TOTFUN)
#create the map 
leaflet() |>
  setView(lng = US_long, lat = US_lat, zoom = 3)|>
  addProviderTiles("Esri.WorldGrayCanvas") |>
  addCircles(data = libs_ru, 
             radius = sqrt(libs_ru$TOTFUN) * 20, 
             color = ~mapcols(libs_ru$TOTFUN), 
             fillOpacity = .8, stroke = FALSE, popup = popuplibs)|>
  addLegend(pal = mapcols, values = libs_ru$TOTFUN, position = "bottomleft", title = "Total Library Funding, FY2023 ($)")

## Assuming "LONGITUDE" and "LATITUDE" are longitude and latitude, respectively

My map shows rural public library systems across the US. Data points that are darker green and larger represent library systems that receive more funding. The rural library systems that receive the most funding appear clustered in two regions: the DMV and Florida. Despite their rural designation, the libraries in these two regions are likely located in or near high-income communities. For example, Loudoun County is one of the most affluent counties in Virginia, despite being more rural compared to other parts of Northern Virginia. Unsurprisingly, the Loudoun County Public Library System is one of the largest points on the map.

Conclusion

There are two main takeaways from my visualizations. Local governments provide most of the funding for public libraries. This is consistent with how the US public library system is structured financially. Most public libraries rely on a mix of local property taxes, local budget allocations, and local government grants. Nonetheless, state and federal funding is still important, and cuts at any level will have a negative impact. Second, a library’s geographic locale — rural, suburban, or urban — influences how much funding it receives. Unsurprisingly, cities receive the most funding overall. Cities have the highest populations, which translates into a larger revenue base. Libraries in cities also have greater access to other revenue streams – NYC’s library systems, for example, receive many grants from private foundations in the city, including the Carnegie Corporation. The map confirms this conclusion: public libraries in rural areas do not receive as much funding. Most of the rural states, like Iowa, Montana, and Idaho, are dotted by small, light green data points. However, there are also outliers on the map. The Unalaska Public Library stands out the most to me, given how remote Alaska is and how little funding other libraries there receive in comparison. I was unable to find why Unalaska is such an outlier, but if I had more time, I would like to dig deeper. Overall, the outliers indicate that geographic rurality is not the sole determinant of total funding. In the future, I would like to examine the relationship between per-capita income, property values, and library funding.

As I worked on this project, I was amazed by how many services a library provides, and how much money is required to run a library. Libraries are an essential part of of our public infrastructure. Like any other public service, they require funding and public support to continue serving our communities. The restoration of IMLS grants is a positive development, but the future of public library systems depends on reforms to the current funding model and greater recognition of the vital role libraries play in our society. In the meantime, I will continue supporting my local library branch.

Bibliography

Advocacy in Action. (2015, June 25). Common public library funding myths. WebJunction. https://www.webjunction.org/documents/webjunction/advocacy-in-action/common-public-library-funding-myths.html

Color hex color codes. Color Hex Codes. (n.d.). https://www.color-hex.com/

Griffin, A. (2025, November 10). Federal cuts, immigration raids and a slowing economy hit rural libraries. The New York Times. Retrieved from https://www.nytimes.com/2025/11/10/us/politics/rural-libraries.html?searchResultPosition=6.

Montilla, A. (2024, September 19). New support for New York City’s public libraries from the foundation established by Andrew Carnegie. Carnegie Corporation of New York. https://www.carnegie.org/news/articles/new-support-for-new-york-citys-public-libraries-from-the-foundation-established-by-andrew-carnegie/

Pelczar, M., Li, J., Alhassani, S., Barr, K., & Mabile, S. (2025, August). Public Libraries Survey, Fiscal Year 2023: Data File Documentation and User’s Guide. Institute of Museum and Library Services.

Public libraries survey (PLS). Institute of Museum and Library Services. (n.d.). https://www.imls.gov/research-evaluation/surveys/public-libraries-survey-pls

Schuessler, J. (2025, December 5). The New York Times. Retrieved from https://www.nytimes.com/2025/12/05/arts/imls-library-grants-trump.html?searchResultPosition=4.

Wickham, H. (n.d.). Convert string to upper case, lower case, title case, or sentence case - case. Convert string to upper case, lower case, title case, or sentence case. https://stringr.tidyverse.org/reference/case.html