About Datasets:

These datasets have the following information :

Name of all public schools in the United States and what district they belong to.
Name of all school districts in the United States and their expenses spent on operations per student between 2000 and 2018 and their revenue from federal, state and local sources.
Name of all school districts on the population between 5-17(school age) living in poverty as a percentage of the total population of school age people.
All school gun incidents between 2000-2018 in the United States with victim count.

Content of the Dataset:

The variable names are self explanatory but for further investigation please check https://educationdata.urban.org/documentation/schools.html#overview and https://www.chds.us/ssdb/

Purpose of this report:

The number of gun incidents in the United States have gone up since the 2000s. These incidents can either be with an active shooter or a non-active shooter. The Department of Homeland Security defines an active shooter as an individual actively engaged in killing or attempting to kill people in a confined and populated area; in most cases, active shooters use firearms and there is no pattern or method to their selection of victims whereas a non-active shooter event may be as little as a gun accident,it may be a fight between individuals or even gang related with a clear target and motive. The purpose of this research is to see if there is a relationship between the effect of gun incidents and public school districts security expenditures. The NCES defines Operation and Maintenance of Plant expenses as “Activities concerned with keeping the physical plant open, comfortable, and safe for use and with keeping the grounds, buildings, and equipment in effective working condition and state of repair. These include the activities of maintaining safety in buildings, on the grounds, and in the vicinity of schools.” It should be noted that this does not cover in-service training for security employees however it is clear that this represents of the bulk of the security budget. Operation and Maintenance is also a subset of Support Services. Which accounts for approximately 35% of a district’s budget. The rest being allocated towards for Instruction. We will be exploring this relationship with operation and maintenance of plant expenses as raw numbers and also as a percentage of support services expenses.

Data Source:

Data was obtained from the Urban Institute’s API that draws data from national from most major national data sources on schools, districts, and colleges. More information can be found here. https://educationdata.urban.org/data-explorer/about

Limitations:

The biggest limitation is the data as security services is only a subset Operation and Maintenance expenses. There is also no data on a per school basis. So the change may have been large on the security expenses of a school but it could be the change was not large enough to influence the total Operation and Maintenance expenses or the school’s total budget. To check the accounting standards set by the NCES, more information can be found here.

https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2015347

Part One: Data Preparation:

First we must communicate with the API itself. The Urban Institute provides us the the code for R and lets us select the data and then we will save it in csv shape. The code for R is provided below:

poverty_data <- get_education_data(level = “school-districts”, source = “saipe”) write.csv(poverty_data, “poverty.csv”)

directory <- get_education_data(level = “school-districts”, source = “ccd”, topic = “directory”) write.csv(directory, “district_directory_main.csv”)

data <- get_education_data(level = “school-districts”, source = “ccd”, topic = “finance”) write.csv(data, “district_finance_data.csv”)

district_enrollment <-get_education_data(level = “school-districts”, source = “ccd”, topic = “enrollment”) write.csv(district_enrollment, “district_enrollment.csv”)

data <- get_education_data(level = “schools”, source = “ccd”, topic = “directory”) write.csv(data, “all_schools.csv”)

Secondly, the data between 1999 and 2018 seems to be the most accurate and full. Data outside of that range is incomplete about schools and at times does not have schools registered, hence our data will also be within our range.

#installing libraries
library(educationdata)
library(skimr)
library(stats)
library(QuantPsyc)
library(boot)
library(openxlsx)
library(janitor)
library(tidyverse)
library(sqldf)
library(vroom)
library(readxl)
library(lubridate)

#district Directory has all districts with their addresses which will help us in joining data later with our gun incidents data.
district_directory <- vroom("district_directory_main.csv")

## New names:
## Rows: 622669 Columns: 70
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (18): leaid, lea_name, state_leaid, street_mailing, city_mailing, state_... dbl
## (52): ...1, year, fips, latitude, longitude, urban_centric_locale, cbsa,...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

all_schools <- vroom("schools_ccd_directory.csv")

## Rows: 3381565 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): ncessch, school_id, school_name, leaid, lea_name, state_leaid, sea...
## dbl (34): year, ncessch_num, fips, latitude, longitude, csa, cbsa, urban_cen...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#District finance has data for the districts budgets
district_finance <- vroom("district_finance_data.csv")

## New names:
## Rows: 445037 Columns: 135
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): leaid, censusid dbl (133): ...1, year, fips, rev_total, rev_fed_total,
## rev_fed_child_nutriti...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

#Selecting the columns we want. The year of budget, the plant operation expenses, revenue from federal,state and local sources and support service expenses. First, we are taking all districts with more than 20 counts. This ensures that our data has to be at least within 1999 and 2018. we then filter between our range and then only take those districts with a count of 20. This ensures us that the districts left have complete data between our range. The same will be done to district enrollment as well.
district_finance <- district_finance %>%
  dplyr::select(year,leaid,exp_current_operation_plant,rev_fed_total,rev_state_total,rev_local_total,exp_current_supp_serve_total) %>%
  group_by(leaid) %>%
  filter(n() >= 20) %>%
  filter(year >= 1999) %>%
  filter(year <= 2018) %>%
  filter(n() == 20) %>%
  arrange(leaid)
district_enrollment <- vroom("district_enrollment.csv")

## New names:
## Rows: 7427509 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): leaid dbl (7): ...1, year, fips, grade, race, sex, enrollment
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

 district_enrollment <- district_enrollment %>%
  filter(grade == 99) %>%
  group_by(leaid) %>%
  filter(n() >= 20) %>%
  filter(year >= 1999) %>%
  filter(year <= 2018) %>%
  filter(n() == 20) %>%
  arrange(leaid) 
 #We are joining the district finance data with enrollment using the leaid and year and then finding out the spending numbers per student and finding plant operation expenses as a percentage of support service expenses
district_finance_data <- left_join(district_finance, district_enrollment, by = c("leaid", "year")) %>%
  drop_na() %>%
  mutate(operation_expenses_per_capita = round(exp_current_operation_plant/enrollment),rev_fed_total_per_capita = round(rev_fed_total/enrollment), rev_state_total_per_capita = round(rev_state_total/enrollment), rev_local_total_per_capita = round(rev_local_total/enrollment), opereations_expenses_percentages = exp_current_operation_plant/exp_current_supp_serve_total)
district_directory <- vroom("district_directory_main.csv")

## New names:
## Rows: 622669 Columns: 70
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (18): leaid, lea_name, state_leaid, street_mailing, city_mailing, state_... dbl
## (52): ...1, year, fips, latitude, longitude, urban_centric_locale, cbsa,...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

district_directory

district_directory <- district_directory  %>%
  group_by(leaid) %>%
  filter(n() >= 20) %>%
  filter(year >= 1999) %>%
  filter(year <= 2018) %>%
  filter(n() == 20) %>%
  arrange(leaid) %>%
  mutate(across(.cols = c(lea_name,street_location, city_location, state_location), .fns = tolower)) %>%
  dplyr::select(year,leaid,street_location,city_location,state_location) 
#Now we will have a dataset for all districts within our range with their bdugets and contact information. 
all_districts <- left_join(district_finance_data,district_directory, by = c("leaid", 'year')) %>%
  drop_na()
poverty_data <- vroom("poverty.csv")

## New names:
## Rows: 303149 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): district_id, district_name, leaid dbl (8): ...1, est_population_total,
## est_population_5_17, est_population_5_1...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

poverty_data <- poverty_data %>%
  group_by(leaid) %>%
  filter(n() >= 20) %>%
  filter(year >= 1999) %>%
  filter(year <= 2018) %>%
  filter(n() == 20) %>%
  arrange(leaid) %>%
  dplyr::select(year,leaid,est_population_5_17_poverty, est_population_5_17_poverty_pct) 
#Joining with all districts to have poverty measures as well.
all_districts <- left_join(all_districts,poverty_data, by = c("leaid", "year")) %>%
  drop_na()
#There are some districts that are missing their leaids in between. As in having their leaid between 2005 and 2010 but 2011. So I am having leaids with value -1(indicator for missing) being replaced by an leaid if the leaid the row above and row below are the same.
all_districts <- as.data.frame(apply(all_districts, 2, function(x){
  #find rows with x
  us<-which(x=='-1')
  #replace x with value above (if above=below)
  x[us]<-ifelse(x[us-1]==x[us+1], x[us-1], '-1')
  return(x)
  }))
all_schools <- all_schools %>%
  dplyr::select(1,4,5,6,14,16) %>%
  filter(year == 2018) %>%
  drop_na() %>%
  group_by(school_id) %>%
  distinct()

#The School Shooter Database Data is in an excel workbook with multiple sheets so reading these.
sheet_names <- excel_sheets("SSDB_Raw_Data.xlsx")
list_all <- lapply(sheet_names, function(x) {          # Read all sheets to list
  as.data.frame(read_excel("SSDB_Raw_Data.xlsx", sheet = x)) } )
#after this just make seperate dataframe for each frame
names(list_all) <- sheet_names
#Making the incident  and victim sheet into the incident and incident dataframe.
incident <- list_all$INCIDENT
victims <- list_all$VICTIM
incident <- clean_names(incident)
victims <-clean_names(victims)

#Selecting the variables from incident dataframe
incident <- incident %>%
  dplyr::select(incident_id,date,school,city,state,school_level,active_shooter_fbi,situation)
#finding out the total victim count in each incident 
victims <- victims %>%
  group_by(incidentid) %>%
  mutate(count = n())
victims <- victims %>%
  dplyr::select(incidentid, count)
#Changing column names
names(incident)[names(incident) == 'incident_id'] <- 'incidentid'

#joining incident and victims dataframe to have the victim count for each incident. 
incident_victim <- left_join(incident,victims, by = "incidentid")
incident_victim[is.na(incident_victim)] <- 0
incident_victim <- incident_victim[!duplicated(incident_victim), ]
#getting the date only from incident's date using lubridate package
incident_victim$year_of_incident <- year(incident_victim$date)
incident_victim <- incident_victim %>%
  dplyr::select(-date)
names(incident_victim)[names(incident_victim) == 'count'] <- 'victim_count'
names(incident_victim)[names(incident_victim) == 'school'] <- 'school_name'
names(incident_victim)[names(incident_victim) == 'city'] <- 'city_location'
names(incident_victim)[names(incident_victim) == 'state'] <- 'state_location'
incident_victim

incident_victim$city <- tolower(incident_victim$city_location)
incident_victim$state <- tolower(incident_victim$state_location)
incident_victim$school_name <- tolower(incident_victim$school_name)
incident_victim$state_location <- tolower(incident_victim$state_location)
#filtering for events between 1999 and 2018
incident_victim <- incident_victim %>%
  filter(year_of_incident >= 1999 & year_of_incident <= 2018)
#To join data between all schools and the schools with incident. There was no leaid and the names in the all schools in incident data was longer than the one in the all_schools data. For example, "Willow Woods Elementary school" vs "Willow Woods School" hence we remove the spaces between them first.
remove <- " "
removal <- function(x) {
  str_remove_all(x,paste(remove,collapse = "|"))
}
incident_victim$school_name_one <- removal(incident_victim$school_name)
all_schools$school_name_two <- removal(all_schools$school_name) %>% tolower()
all_schools$state_location <- tolower(all_schools$state_location)

#Uinsg Sqldf package and SQL to join the schools with incidents and all schools. We are using their name and state location. 
districts_with_incidents <- sqldf("SELECT * from incident_victim
LEFT JOIN all_schools ON incident_victim.school_name_one LIKE ('%' || all_schools.school_name_two ||'%') AND (incident_victim.state_location = all_schools.state_location)")
#As our data is on the district table, we can filter data by having only unique incident ids 
districts_with_incidents <-districts_with_incidents %>%
  dplyr::select(incidentid,school_level,victim_count,year_of_incident,leaid, school_name_one, city,active_shooter_fbi) %>%
  group_by(incidentid) %>%
  filter(n() == 1) %>%
  ungroup() %>%
  arrange(leaid,incidentid) %>%
  filter()
#joining districts with incidents and district finance data
districts_incidents_budget <- left_join(districts_with_incidents, district_finance_data, by = "leaid") %>%
  drop_na() %>%
  group_by(incidentid) %>%
  dplyr::select(incidentid,school_level,victim_count,year_of_incident,leaid,year,operation_expenses_per_capita,rev_fed_total_per_capita,rev_state_total_per_capita,rev_local_total_per_capita,active_shooter_fbi,opereations_expenses_percentages)
names(districts_incidents_budget)[names(districts_incidents_budget) == "year"] <- 'budget_year'
final_dataset <- districts_incidents_budget

#creating a dummy variable for years when the incident happened
final_dataset$incident_dummy <- ifelse(final_dataset$budget_year == final_dataset$year_of_incident,1,0)
#now adjusting values for inflation for federal, state and local revenue. Cpi data was obtained from Bureau of Labor Statistics
#cpi formula = value in 2021 dollars = value in x year's dollars * cpix/cpi in 2021
final_dataset

cpi <- read.csv("cpi.csv")

yearly_cpi <- cpi %>%
 rowwise() %>%
  mutate(annual_cpi = sum(c_across(all_of(3:14)))/12) %>%
  dplyr::select(Year, annual_cpi) %>%
  clean_names()
names(yearly_cpi)[names(yearly_cpi) == "year"] <- 'budget_year'

cpi_2021 <- 270.9698


final_dataset <- left_join(final_dataset,yearly_cpi,by = "budget_year")

#adjusting for inflation and making the expenses unit 1000s of dollars
final_dataset <-  final_dataset %>%
  mutate(operation_expenses_per_capita = operation_expenses_per_capita*cpi_2021/(annual_cpi),rev_fed_total_per_capita = rev_fed_total_per_capita*cpi_2021/(annual_cpi), rev_state_total_per_capita = rev_state_total_per_capita*cpi_2021/(annual_cpi),rev_local_total_per_capita = rev_local_total_per_capita*cpi_2021/(annual_cpi)) %>%
   dplyr::select(-annual_cpi)
final_dataset

names(poverty_data)[names(poverty_data) == "year"] <- 'budget_year'

#finding and adding change in budget compoments in dataframe


#Joining with poverty dataset
final_dataset <-left_join(final_dataset,poverty_data,by = c("leaid", "budget_year")) %>%
  drop_na()
#finding out the victims per incident
final_dataset <- final_dataset %>%
  mutate(victim_count_from_incident = victim_count*incident_dummy) 
#Making a dummy variable for if it was an active shooter incident or not.
x <- ifelse(final_dataset$active_shooter_fbi == "1" & final_dataset$victim_count_from_incident > 0,1,0)
final_dataset$active_shooter_fbi <- x
final_dataset <- final_dataset %>%
  dplyr::select(-victim_count)
final_dataset <- final_dataset %>%
  ungroup()
  
final_dataset

Part Two: Analysis

To explore the relationship between school building operation expenses and revenue from federal, state and local sources. Our first linear model will be the following $Y = b_{0} + b_{1}x_{1} + b_{2}x_{2} + b_{3}x_{3}$ Where: $Y$ is the school building operation expenses per student and $b_{0}$ is the intercept. $x_{1}$ is the total revenue from all federal sources per student. $x_{2}$ is the total revenue from all state sources per student. $x_{3}$ is the total revenue from all local sources per student.

final_dataset %>%
  group_by(budget_year) %>%
  summarise(mean = mean(operation_expenses_per_capita)) %>%
 ggplot(aes(budget_year,mean)) +
  geom_point() +
  stat_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

final_dataset %>%
  group_by(budget_year) %>%
  summarise(mean = mean(operation_expenses_per_capita))

model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita , data = final_dataset)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita, 
##     data = final_dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1551.67  -145.44    -8.39   117.82  1688.55 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -10.003626  13.887834   -0.72    0.471    
## rev_fed_total_per_capita     0.156018   0.004883   31.95   <2e-16 ***
## rev_state_total_per_capita   0.076697   0.001317   58.25   <2e-16 ***
## rev_local_total_per_capita   0.069846   0.001020   68.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 247.6 on 5176 degrees of freedom
## Multiple R-squared:  0.5937, Adjusted R-squared:  0.5935 
## F-statistic:  2521 on 3 and 5176 DF,  p-value: < 2.2e-16

As we can see, the revenue from these sources account for the majority of the variance in the amount of money a school spends on building operations. This makes sense as most school districts pass balanced budgets and while districts can raise money through bonds, they do not have the borrowing ability when compared to local, state and federal sources and are subject to voter consent which further reduces their ability to raise money on their own. This is why I will be dividing the dataset into three parts. One will be gun incidents and school budgets between 1999 and 2009, one will be gun incidents and school budgets between 2009 and 2015 and the last one will be gun incidents between 2015 and 2018. First let’s check the same model but with our different time periods.

final_dataset_pre_recession <- final_dataset %>%
  filter(budget_year >= 1999 & budget_year <= 2009) %>%
  filter(year_of_incident <= 2009)
final_dataset_slump <- final_dataset %>%
  filter(budget_year >= 2009 & budget_year <= 2015) %>%
  filter(year_of_incident >= 2009 & year_of_incident <= 2015)
final_dataset_recovery <- final_dataset %>%
  filter(budget_year >= 2015) %>%
  filter(year_of_incident >= 2015)
model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita, data = final_dataset_pre_recession)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita, 
##     data = final_dataset_pre_recession)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -956.0 -142.9   -4.3  105.8 1486.8 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -1.985e+02  2.929e+01  -6.775 2.05e-11 ***
## rev_fed_total_per_capita    1.459e-01  9.385e-03  15.547  < 2e-16 ***
## rev_state_total_per_capita  1.016e-01  2.871e-03  35.379  < 2e-16 ***
## rev_local_total_per_capita  8.242e-02  2.254e-03  36.571  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 259.3 on 1074 degrees of freedom
## Multiple R-squared:  0.6919, Adjusted R-squared:  0.691 
## F-statistic: 803.8 on 3 and 1074 DF,  p-value: < 2.2e-16

model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita , data = final_dataset_slump)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita, 
##     data = final_dataset_slump)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -960.50 -136.89   -3.61  108.14  854.64 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -2.559e+02  5.023e+01  -5.095 4.94e-07 ***
## rev_fed_total_per_capita    1.973e-01  1.290e-02  15.297  < 2e-16 ***
## rev_state_total_per_capita  8.790e-02  4.956e-03  17.739  < 2e-16 ***
## rev_local_total_per_capita  8.700e-02  3.438e-03  25.304  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 225.6 on 507 degrees of freedom
## Multiple R-squared:  0.6337, Adjusted R-squared:  0.6315 
## F-statistic: 292.4 on 3 and 507 DF,  p-value: < 2.2e-16

model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita , data = final_dataset_recovery)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita, 
##     data = final_dataset_recovery)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -625.51 -121.01   -1.17  120.11  886.58 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -68.589537  51.974088   -1.32    0.188    
## rev_fed_total_per_capita     0.199853   0.019051   10.49   <2e-16 ***
## rev_state_total_per_capita   0.068916   0.003979   17.32   <2e-16 ***
## rev_local_total_per_capita   0.072213   0.003792   19.04   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 217.2 on 432 degrees of freedom
## Multiple R-squared:  0.568,  Adjusted R-squared:  0.565 
## F-statistic: 189.3 on 3 and 432 DF,  p-value: < 2.2e-16

We can see that in all three periods. Our model explains most of the variance the operation expenses however as time has passed, the relationship between the federal spending per capita has gotten stronger while the relationship between state and local spending has gotten weaker. The change in the relationship of local and state spending can be explained by the recession itself as unlike the federal government that has the ability to borrow large amounts of money to raise capital, which it did by providing fiscal stimulus in the form of Economic Stimulus Act of 2008 and the American Recovery and Reinvestment Act of 2009 while the former was a tax rebate similar to the recent stimulus checks, the latter pumped $100 billion in the education sector alone between 2009 and 2019 whereas both local and state governments do not have the same ability to provide such fiscal stimulus especially local governments who mostly funded their school districts via property taxes that were in a downturn, while this is a good explanation for the slump period. For the relationship between operations funding and local and state funding to further decrease in the recovery period is surprising. As by late 2012, property values were higher than pre-recession values. This may show that money revenue from federal sources are now becoming a bigger part of covering expenses regarding operations of buildings. In the graph below, we will the funding from different sources between the years 1999 and 2018.

source_mean <- final_dataset %>%
  group_by(budget_year) %>%
  summarise(fed_mean = mean(rev_fed_total_per_capita),state_mean = mean(rev_state_total_per_capita), local_mean = mean(rev_local_total_per_capita))
source_mean %>%
  pivot_longer(cols = c(fed_mean,state_mean,local_mean), names_to = "revenue_source", values_to = "funding") %>%
  ggplot(aes(budget_year,funding,group_by = revenue_source, col = revenue_source)) +
  geom_line()

source_mean

As we can see during our recession period, both local and state funding for education fell whereas the federal government’s went up. We can also see that the share from the federal source started decreasing in 2013, right around when property values had recovered. We can also see both funding amounts from state and local sources starting to pick back up again and are now above pre-pandemic levels of funding as well. This makes the relationship between operations expenses and federal source getting stronger in the recovery period even stranger. The best explanation could be that operations is a smaller priority for school districts hence the extra money coming from federal sources could justify extra spending on operations otherwise other budget items must be paid for first by the primary source of revenues which are state and local. Our second linear model will stay the same but we will add another variable which will be dummy variable for an active school shooting. One means that were was an active shooting and 0 means that there was not an active shooting

model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita +  active_shooter_fbi
, data = final_dataset_pre_recession)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita + 
##     active_shooter_fbi, data = final_dataset_pre_recession)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -957.04 -143.36   -2.06  105.62 1485.60 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -1.970e+02  2.929e+01  -6.726 2.83e-11 ***
## rev_fed_total_per_capita    1.455e-01  9.382e-03  15.513  < 2e-16 ***
## rev_state_total_per_capita  1.017e-01  2.871e-03  35.432  < 2e-16 ***
## rev_local_total_per_capita  8.238e-02  2.252e-03  36.573  < 2e-16 ***
## active_shooter_fbi         -9.797e+01  6.340e+01  -1.545    0.123    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 259.1 on 1073 degrees of freedom
## Multiple R-squared:  0.6925, Adjusted R-squared:  0.6914 
## F-statistic: 604.2 on 4 and 1073 DF,  p-value: < 2.2e-16

model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita +  active_shooter_fbi
, data = final_dataset_slump)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita + 
##     active_shooter_fbi, data = final_dataset_slump)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -961.32 -135.45   -4.71  108.06  852.86 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -2.541e+02  5.033e+01  -5.048 6.24e-07 ***
## rev_fed_total_per_capita    1.967e-01  1.294e-02  15.194  < 2e-16 ***
## rev_state_total_per_capita  8.801e-02  4.960e-03  17.742  < 2e-16 ***
## rev_local_total_per_capita  8.696e-02  3.441e-03  25.274  < 2e-16 ***
## active_shooter_fbi         -4.113e+01  5.933e+01  -0.693    0.488    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 225.7 on 506 degrees of freedom
## Multiple R-squared:  0.6341, Adjusted R-squared:  0.6312 
## F-statistic: 219.2 on 4 and 506 DF,  p-value: < 2.2e-16

model <- lm(operation_expenses_per_capita ~ rev_fed_total_per_capita + rev_state_total_per_capita + rev_local_total_per_capita +  active_shooter_fbi
, data = final_dataset_recovery)
summary(model)

## 
## Call:
## lm(formula = operation_expenses_per_capita ~ rev_fed_total_per_capita + 
##     rev_state_total_per_capita + rev_local_total_per_capita + 
##     active_shooter_fbi, data = final_dataset_recovery)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -624.99 -122.01   -1.09  119.61  887.95 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -71.080950  52.329254  -1.358    0.175    
## rev_fed_total_per_capita     0.200581   0.019140  10.480   <2e-16 ***
## rev_state_total_per_capita   0.068933   0.003983  17.307   <2e-16 ***
## rev_local_total_per_capita   0.072285   0.003799  19.026   <2e-16 ***
## active_shooter_fbi          22.563212  51.227210   0.440    0.660    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 217.4 on 431 degrees of freedom
## Multiple R-squared:  0.5682, Adjusted R-squared:  0.5642 
## F-statistic: 141.8 on 4 and 431 DF,  p-value: < 2.2e-16

We can see that in all three time periods. The effect that an active shooter event has on a school district’s budget is statistically non significant and we fail to reject the null hypothesis. It seems that the biggest contributors to the operations expenses variances are the revenue sources for the school district and that in periods where the relationship between operations spending and revenue from federal sources was the strongest, the relationship between local and state government sources and operations spending was at its lowest.

The Effect of Active Shooter Gun Incidents on Public School Districts Security budgets

Farid Balouch

July 11, 2022