Research Paper Data Selection

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

I have chosen a data set that contains various KPIs of library services and library use within the fifty states plus Washington DC to be the subject of my research paper. I have also added to that data set several economic indicators captured for each state. Both data sets were created and was provided to the public by the Institute of Museum and Library Services (IMLS), an independent government agency which is the primary overseer of federal support and policy to the country’s museums and libraries. With this data I aim to explore the relationship between a state’s sociology-economic situation and the availability/use of its library resources.

research_data<-read.csv("state_data_combo.csv")

Here is a summary of some of the variables contained within the data set that I am most interested in.

summary(research_data$Poverty.Rate....)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.90   11.15   13.70   13.65   15.75   20.80

summary(research_data$Percent.with.no.home.Internet..2018.)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.10   16.80   19.10   19.72   21.35   31.50

summary(research_data$Library.Visits.Per.Capita)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.600   1.770   2.290   2.330   3.025   3.820

summary(research_data$Total.Circulation.Per.Capita)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.640   4.365   5.460   6.038   7.535  12.170

summary(research_data$Registered.Users.Per.Capita)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2200  0.4100  0.5000  0.4875  0.5600  0.7500

The set isn’t missing any data.

Here are some histograms showing the frequency distributions amongst those variable samples within the country.

hist(research_data$Poverty.Rate....)

hist(research_data$Percent.with.no.home.Internet..2018.)

hist(research_data$Library.Visits.Per.Capita)

hist(research_data$Total.Circulation.Per.Capita)

hist(research_data$Registered.Users.Per.Capita)

At first glance the distribution of the selected variables across the states appears to have some normalcy, but that quality of the data will be explored further later.

Here are some sample plots that show the relationship between some of the economic metrics and library use KPIs:

plot(research_data$Poverty.Rate....,research_data$Registered.Users.Per.Capita)

plot(research_data$Percent.with.no.home.Internet..2018.,research_data$Library.Visits.Per.Capita)

plot(research_data$Percent.with.no.home.computer..2018.,research_data$Children.s.Material.Circulation.Percentage)

Here are the correlations between the the selected relationships from the above plots:

cor(research_data$Poverty.Rate....,research_data$Registered.Users.Per.Capita)

## [1] 0.1589795

cor(research_data$Percent.with.no.home.Internet..2018.,research_data$Library.Visits.Per.Capita)

## [1] -0.3044928

cor(research_data$Percent.with.no.home.computer..2018.,research_data$Children.s.Material.Circulation.Percentage)

## [1] -0.2514911

The correlation between the variables does not appear to be very strong. I will be digging into their relationships more soon, as well as testing the relationships between the other variables in the data set.

Research Paper Data Selection

Tyler Schulze

2025-02-26