library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(readxl)
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
dataset<-read.csv("state_data_combo.csv")
library_visit_rate<-dataset$Library.Visits.Per.Capita
poverty_rate<-dataset$Poverty.Rate....
unemployment_rate<-dataset$Unemployment.rate.2019
no_computer_ownership<-dataset$Percent.with.no.home.computer..2018.
broadband_rate<-dataset$Percent.with.home.Broadband
linear_model<-lm(library_visit_rate~poverty_rate+unemployment_rate+no_computer_ownership+broadband_rate,data=dataset)
summary(linear_model)
##
## Call:
## lm(formula = library_visit_rate ~ poverty_rate + unemployment_rate +
## no_computer_ownership + broadband_rate, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0579 -0.5898 -0.0346 0.4654 1.7209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.35273 8.06680 0.911 0.367
## poverty_rate -0.09598 0.07820 -1.227 0.226
## unemployment_rate -0.01574 0.11185 -0.141 0.889
## no_computer_ownership -0.04981 0.09756 -0.511 0.612
## broadband_rate -0.03848 0.08317 -0.463 0.646
##
## Residual standard error: 0.7014 on 46 degrees of freedom
## Multiple R-squared: 0.1398, Adjusted R-squared: 0.06496
## F-statistic: 1.868 on 4 and 46 DF, p-value: 0.1321
The adjusted r-square is .06496, which means that this linear regression model only explains around ~6.5% of the the data.
The p-value is 0.1321, which means that it is very unlikely that the independent variables that I selected influences the dependent variable.
None of my independent variables are particulary signficant, as they have p-values that are significantly over .06. Poverty Rate is the most signifcant with a p-value of .226.
The most surprisingly insignificant variable is unemployment rate. Anecdotally, local libraries in my experience offer programs that help unenmployed citizens with job applications, as well as providing computer stations for resume editing and browsing online job postings.
many_iv_model<-lm(Total.Circulation.Per.Capita~Unemployment.rate.2019+Poverty.Rate....+Percent.with.no.home.computer..2018.+Percent.with.home.Broadband,data=dataset)
summary(many_iv_model)
##
## Call:
## lm(formula = Total.Circulation.Per.Capita ~ Unemployment.rate.2019 +
## Poverty.Rate.... + Percent.with.no.home.computer..2018. +
## Percent.with.home.Broadband, data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8218 -1.4858 -0.1292 1.5521 6.7124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2410 24.3769 0.051 0.960
## Unemployment.rate.2019 -0.2501 0.3380 -0.740 0.463
## Poverty.Rate.... 0.1244 0.2363 0.527 0.601
## Percent.with.no.home.computer..2018. -0.3378 0.2948 -1.146 0.258
## Percent.with.home.Broadband 0.1009 0.2513 0.402 0.690
##
## Residual standard error: 2.12 on 46 degrees of freedom
## Multiple R-squared: 0.2532, Adjusted R-squared: 0.1882
## F-statistic: 3.899 on 4 and 46 DF, p-value: 0.008279
Some of the independent variables can be estimated to increase the circulation per capita rate of books in the nation’s library systems. For instance, if the poverty rare of the country increased by a percentage point, circulation would increase by .1244%. Also, if the percent of people without access to home broadband in the country, we could estimate that the circulation rate would rise by .1009%.
Additionally, some of the independent variables are estimated to decrease the circulation per capita. Increases of a single percentage point to the unemployment rate or the percentage of the population that do not own a home computer are estimated to lower to circulation rate by .2501% and .3378% respectively.
The rates of broadband internet access and home computer ownership having opposite magnitudes is surprising, as one would expect those two variables to be strongly correlated themselves.
plot(linear_model,which=1)
Looking at this plot, I would interpret my model as having violated the assumptions of linearity. The plot certainly doesn’t appear linear on the face, and the data points as well do not appear to be sufficiently homoscedastic.