#load libraries
library(tidyverse)
#import data
services_raw <- read_csv("Library_Services_20241204.csv")
cardreg_raw <- read_csv("New_Library_Card_Registration_20241204.csv")
matuse_raw <- read_csv("Total_Use_of_Library_Materials_20241204.csv")
visits_raw <- read_csv("Visits_to_Library_Branches_20241204.csv")W14-P3-LibraryExpanded
Project 3: Library Stats
“Library Bookshelf” by Open Grid Scheduler / Grid Engine is marked with CC0 1.0.
Introduction
In July of 2021, the COVID-19 state-of-emergency officially ended in Maryland. While some libraries remained operational even during lockdowns, reopening expanded the services provided and allowed more people to access those services in-person (Herron, 2021). Data collected since then could reveal interesting patterns as people return to using them in-person with less limi
I’ll be exploring this through datasets collected by dataMontgomery, Montgomery County, MD, in collaboration with the Public Libraries Department. None of them have detailed documentation on methodology or data collection, other than stating that they’re updated annually. The datasets I used and relevant variables are:
Library Services: lists demographic information for each branch.
Latitude & Longitude: for GIS
Branch: name of library branch
New Library Card Registration: records new library card registrations in the given year and branch
Fiscal Year: numeric year
1 column per branch, 22 total: number of new library cards registered in the time period for each branch
Total Use of Library Materials: records number of loaned materials in the given time period
FY/Quarter: Financial year and quarter, as a string
1 column per branch, 23 total (including e-content): number of materials loaned in the time period for each branch
Visits to Library Branches: records numbers of foot traffic in the given time period by branch
FY/Quarter: Financial year and quarter, as a string
1 column per branch, 21 total: number of visits in the time period for each branch
Setup
Cleaning
Most of the cleaning includes pivoting into long format and making sure the library branches all follow the same naming scheme. Then, they can be joined in various configurations by the Branch variable.
matuse <- matuse_raw |> #pivot into longer format
pivot_longer(cols = 2:"E-content", names_to = "Branch", values_to = "Material Use") |>
#remove duplicated rows
distinct(.keep_all = T) |>
#change branch names to match each other
mutate(Branch = recode(Branch,
"Silver Spring" = "Brigadier General Charles E. McGee",
"Marilyn J Praisner"="Marilyn J. Praisner")
)
visits <- visits_raw |> #pivot into longer format
pivot_longer(cols = 2:"White Oak", names_to = "Branch", values_to = "Visits") |>
#change branch names to match each other
mutate(Branch = recode(Branch,
"Brigadier General Charles E. McGee (Silver Spring)"
= "Brigadier General Charles E. McGee",
"Connie Morella (Bethesda)"="Connie Morella",
"Davis (North Bethesda)"="Davis",
"Maggie Nightingale (Poolesville)"="Maggie Nightingale",
"Marilyn J. Praisner (Burtonsville)"="Marilyn J. Praisner",
"Noyes Library for Young Children" = "Noyes",
"Rockville" = "Rockville Memorial")
)
cardreg <- cardreg_raw|> #pivot into longer format
pivot_longer(cols = 2:"White Oak", names_to = "Branch", values_to = "Card Registrations") |>
#change branch names to match each other
mutate(Branch = recode(Branch,
"Brigadier General Charles E. McGee (Silver Spring)"
= "Brigadier General Charles E. McGee",
"Connie Morella (Bethesda)"="Connie Morella",
"Davis (North Bethesda)"="Davis",
"Maggie Nightingale (Poolesville)"="Maggie Nightingale",
"Marilyn J. Praisner (Burtonsville)"="Marilyn J. Praisner",
"Noyes Library for Young Children" = "Noyes",
"Rockville" = "Rockville Memorial"),
#make year variable numeric
Year = as.numeric(gsub("FY", "20",`Fiscal Year`)), .keep = "unused"
)
services <- services_raw |>
#select relevant variables
select(Branch, Latitude, Longitude) |>
#change branch names to match each other
mutate(Branch = recode(Branch,
"Silver Spring" = "Brigadier General Charles E. McGee",
"Marilyn Praisner"="Marilyn J. Praisner",
"Rockville" = "Rockville Memorial")
)
#combine visits + material use for analysis, reformat the dates
vis_mat <- full_join(visits, matuse, by = join_by(Branch, `FY/Quarter`)) |>
separate(`FY/Quarter`, c("Year", "Quarter"), sep = "-Q", convert = T )
#combine again with card registrations, but now summarized by year
# > this will be used for the linear analysis
num_sum <- vis_mat |> group_by(Year, Branch) |>
summarise(Visits = sum(Visits), `Material Use` = sum(`Material Use`)) |>
full_join(cardreg, by = join_by(Branch, Year)) |>
filter(Branch != "E-content" & Branch != "Correctional Facility")`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
#rewrite datasets for later use in tableau
write_csv(cardreg, "Cards.csv")
write_csv(vis_mat, "QuarterlySums.csv")
write_csv(services, "Services.csv")Multiple Linear Regression: Predicting New Library Card Registrations
#create model
lib_lm <- lm(`Card Registrations` ~ `Material Use` + Visits + Year,
data = num_sum)
lib_lm |> summary()
Call:
lm(formula = `Card Registrations` ~ `Material Use` + Visits +
Year, data = num_sum)
Residuals:
Min 1Q Median 3Q Max
-4046.3 -2552.1 -614.9 1226.0 16685.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.930e+06 2.851e+06 -2.781 0.00838 **
`Material Use` 4.278e-02 1.810e-02 2.364 0.02331 *
Visits -8.641e-03 8.673e-03 -0.996 0.32542
Year 3.921e+03 1.409e+03 2.782 0.00836 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3902 on 38 degrees of freedom
(21 observations deleted due to missingness)
Multiple R-squared: 0.262, Adjusted R-squared: 0.2037
F-statistic: 4.497 on 3 and 38 DF, p-value: 0.00852
plot(lib_lm) #diagnostic plotsThe model equation is as follows:
Card Registrations = -7930000 + 427.8(Material Use) - 0.0086(Visits) + 3921(Year)
While the p-value = 0.0085 for the entire model is statistically significant, the R2 = 0.2037 means these variables can explain only about 20.37% of all the variation in total new library card registrations. Analyzing the diagnostic plots also shows a few outliers that might make the model less reliable, notably observations 43, 49, and 22.
Material use (*427.8; p=0.023) has a positive relationship with new card registrations, and is statistically significant. However, year (*3921; p=0.008) has a larger positive impact and level of significance. In contrast, visits (*-0.008; p=0.32) negatively affects new card registrations, by a small amount, but is also statistically insignificant.
Strangely, the amount of foot traffic the library records does not seem to significantly affect new library card registration, but the negative coefficient value still seems like an unusual relationship, given that registering requires being at the library in-person (MCPL). Removing the “visits” variable impacted the model very little, so I chose to keep it included and highlight the odd finding.
Visualizations:
The three visualizations interact to show the utilization of of libraries since mid 2021 to mid 2023. Generally, usage of both the materials and physical space have increased.
Quarterly Foot Traffic
This series of boxplots shows summary values of foot traffic to the libraries over time. There is a predictable increase going into 2022, where then the values remain relatively stable. Without data before or during COVID, it’s not possible to compare traffic to past numbers before COVID lockdowns, but the fact that they increased and are now stable should be a good sign.
Total Material Usage
The treemap/bar graph highlights how many materials are loaned by libraries over time. This visualization shows a general increase in material use overall. Q3-4 of 2022 stick out as the only time the usage decreased, and further investigation may be needed to understand why. The chart also shows how widely the digital library is used, at almost the same amount as all other library branches combined. Again it would be interesting to compare with earlier data, if that data existed, to see if lockdowns had affect on the use of e-content.
Traffic & Usage
This map combines foot traffic and material use, but is colored not by total loans, but by the ratio of material loans to visitors. Seeing these points also scaled to size by foot traffic and mapped geographically reveals new insights the previous graphs did not. For example, libraries with more higher visitor counts fluctuated less over time, while less popular libraries had more variation in their visitor numbers. Also, these more popular libraries have less materials loaned proportionally, suggesting that more people may be visiting for reasons unrelated to book borrowing.
References
Herron, P. (2021, May 10). Montgomery County announces plans to reopen libraries. The MoCo Show. https://mocoshow.com/2021/05/10/montgomery-county-announces-plans-to-reopen-libraries/
MCPL. (n.d.). Your MCPL Library Card. Your MCPL Library Card - Montgomery County Public Libraries (MCPL) - Montgomery County, Maryland. https://montgomerycountymd.gov/library/services/card.html
Open Grid Scheduler / Grid Engine. (n.d.). Library Bookshelf. Flickr. Retrieved December 15, 2024, from https://www.flickr.com/photos/29155878@N03/22468805072.