See https://www.dropbox.com/s/zqss2h3nfyze1y1/NC%20Results%20Write%20Up.docx?dl=0 for a write up.

Recap: After reviewing the Case Infraction reports put out by the NC administrator of courts, we realized that we could not use it to calculate the yearly DIH charges because …

  1. It covered FY (which starts in July in NC)

  2. In NC, if a charge is filed in district court (which all misdemeonars are and some felonies), it is removed to the superior court for trial if it is a felony or if a felony is later added on. In which case, the the dataset would count both as cases. So we would get a double count.

So, Jen emailed the Court of clerks and we asked them for annual counts of unique cases with the following charges by District for 2015-2021: SECOND DEG MURDER DIST DRUG-0943 DEATH BY DISTRIBUTION-0952 DEATH BY DISTRIBUTION/AGG-0953

We received them by county and then I had to aggregate them into districts based on this statute: Article 9. District Attorneys and Prosecutorial Districts. § 7A-60. District attorneys and prosecutorial districts. https://www.dropbox.com/s/0a09y4halt73ndv/NC%20prosecutorial%20districts.pdf?dl=0

I also added up all of the different types of charges and the total accross all years.

Dataset can be found here: https://www.dropbox.com/s/c8c3tu04icfbdbt/Courts%20Data_Death%20by%20Distribution%20.xlsx?dl=0

We also wanted to make sure we included stimulant and opioid related deaths so we asked the NC DHHS to give us all fatal accidental overdose deaths by county from 2015-2021. They provided us with 2016-2021.North Carolina Unintentional Medication/Drug Overdose Death Counts, by Quarter and County, 2016-2021*

To come up with the numbers for 2015, I used the dataset Brandon got from NC DHHS of opioid and stimulant fatal overdoses (https://www.dropbox.com/s/xtzndub7gb0zlcy/Stimulant%20%26%20OD%20Death_NC%20to%202020.xlsx?dl=0) and aggregated the opioid and stimulant counts for each county for 2015 in the second tab.

Aggregate Counties into Districts

[Import DIH Counts]

library(readxl)
Courts_Data_Death_by_Distribution_ <- read_excel("/Users/taleedel-sabawi/Dropbox/1-Research/DIH/DIH Data/Courts Data_Death by Distribution .xlsx", sheet = "Districts by Year")

[change name of dataframe to make it shorter]

courts <-Courts_Data_Death_by_Distribution_

[change name of aggregate variable]

courts$Tots <-courts$Aggregate

[Since the districts are numbered; telling R to treat the variable like factors/characters vs numbers]

courts$Districts <- factor(courts$District)

[aggregate count of all types of DIH charges by district]

DIH <- aggregate(Tots ~ Districts, courts, sum)

DIH$Tots is the outcome variable of interest

To aggregate total OD deaths by district for the time period…

NOTE: 2021 is considered provisional data at this point.

[Import OD Counts]

library(readxl)
OD_data <- read_excel("/Users/taleedel-sabawi/Dropbox/1-Research/DIH/DIH Data/2022-12-26-MedDrug by County .xlsx", sheet = "Import into R")
View(OD_data)

[Since the districts are numbered; telling R to treat the variable like factors/characters vs numbers]

OD_data$Districts <- factor(OD_data$District)

[aggregate count of all types of ODs by district]

OD <- aggregate(Total ~ Districts, OD_data, sum)

now to merge the two dataframes OD and DIH…

dataset <-merge(OD, DIH, by.x = "Districts", by.y = "Districts")
dataset$OD <-dataset$Total #renaming these variables so they make sense
dataset$DIH <-dataset$Tots

Then with the old data…

 library(readxl)
Final_JCOIN_data <- read_excel("/Users/taleedel-sabawi/Dropbox/1-Research/Grants/JCOIN/Final_JCOIN_data.xlsx",
    col_types = c("numeric", "skip", "skip", 
        "skip", "skip", "numeric", "skip", 
        "skip", "skip", "skip", "numeric", 
        "numeric", "numeric", "skip"))
View(Final_JCOIN_data)
data <-merge(dataset, Final_JCOIN_data, by.x = "Districts", by.y = "District")

(final datasaet that we will use is called “data”)

Descriptive Statistics

Number of Unintentional Overdose Deaths from 2015-2021 by District (all meds for 2016-2021, opioids & stimulants for 2015)

See above for variable description. We used this date range because it matches the date ranges used for DIH charges.

summary(data$OD)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    20.0   216.8   353.0   392.6   506.5  1433.0

DIH Charges

See description above. calendar years 2015-2021

DIH charges

summary(data$DIH)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   4.500   5.524   7.000  29.000

Number of Prosecutors

Number of Prosecutors per Prosectural District Source: NC Statute see above

summary(data$Pros)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00    9.00   12.00   14.67   15.00   58.00

Rural v. Non-rural

The base category is Non-rural. If a district has a rural county, then it is included in the rural category (even if the other counties are not rural)(which is 1). If a district has only suburban and urban counties, it is non-rural(which is 0).

Source: https://www.ncruralcenter.org/about-us/

data$Rural_2<- as.factor(data$`Rural_2`)
summary(data$Rural_2)
##  0  1 
## 13 29

So, there are 13 non-rural districts in the dataset and 29 rural districts.

##TTl_Crim

“TTl_Crim” represents the number of criminal charges filed by prosecutors in each district in 2018-2019 in District Courts. We decided we werent going to include this variable because (1) it doesn’t include superior courts (2) the remaining variables are 2015-2021 and if we include this variable too then we should have compiled it for those years. I’m goign to include the analysis here for our informational purposes..

Source: https://www.nccourts.gov/documents/publications/criminalinfraction-case-activity-report-by-prosecutorial-district

summary(data$Ttl_Crim)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.00   83.25  133.00  171.33  224.50  470.00

Population

If it is of interest… here is the differences in population based on the 2009 Census data. 2009 Census data source: https://files.nc.gov/ncosbm/demog/countygrowth_2009.html

summary(data$Population)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   73553  138475  178095  223534  234812  906473

##OD Let’s start by looking at the relationship between the OD deaths and the number of DIH charges. Do districts that tend to have more ODs have more DIH charges?

library('dplyr')
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
f <- ggplot(data, aes(DIH, OD))
f + geom_bar(stat="identity")

library(ggplot2)
# Basic line plot with points
ggplot(data=data, aes(x=OD, y=DIH, group=1)) +
  geom_line(color="blue")+
  geom_point()

OD is the underlying incident that gives rise to a DIH charge. So, theoretically the more OD deaths the more “opportunities” the prosecutor has to file an DIH charge. What we would expect to see then is as the number of overdose deaths increases teh number of DIH charges increase. We don’t see that linear relationship here. The district with the most DIH charges have no where near the most ODs.

##Pros Other studies have found that as the number of prosecutors in a district increase so do the number of criminal charges. Let’s see if this is the case in NC with the previous variable of TTL_Crim (just for fun)

e <- ggplot(data=data, aes(x=Pros, y=Ttl_Crim, group=1)) +
  geom_line(color="blue")+
  geom_point()
e

This is odd. It suggests that some of the prosecutorial districts with the least prosecutors have the highest criminal charges – but that relationship is not linear.

Let’s see what the relationship is with only DIH charges

g <- ggplot(data=data, aes(x=Pros, y=DIH, group=1)) +
  geom_line(color="blue")+
  geom_point()
g

Similar but notably different pattern. Again, shoudl probably not include the analysis of Ttl_Crim.

##Ttl_Crim Again this analysis is just for our eyes…. The literature also suggests that the total number of crimes charged should predict prosecutorial activity.

h <- ggplot(data=data, aes(x=DIH, y=Ttl_Crim, group=1)) +
  geom_line(color="blue")+
  geom_point()
h

There doesn’t seem to be a relationship between the total number of crimes charged and the number of DIH cases charged…

##Rural_2

Let’s take a look at urban/rural differences in DIH charges filed.

library(doBy)
## 
## Attaching package: 'doBy'
## The following object is masked from 'package:dplyr':
## 
##     order_by
summaryBy(DIH ~ Rural_2, data = data, 
          FUN = list(mean, max, min, median, sd))
##   Rural_2 DIH.mean DIH.max DIH.min DIH.median   DIH.sd
## 1       0 4.615385      13       1          4 3.548203
## 2       1 5.931034      29       0          5 5.712962

Both the mean (nonrural districts= 4.6, districts with a rural county =5.9) and the median (nonrural districts= 4, districts with a rural county =5) suggest that the number of DIH charges filed are similar between districts with rural counties and those without… However, it looks like the range is a lot bigger in districts with a rural county… 0-29 v. 1-13

#Population

Population and DIH charges

ggplot(data=data, aes(x=Population, y=DIH)) +
 geom_line(color="purple2")+
  geom_point()

#  geom_bar(stat="identity", color="blue", fill="white")
 # geom_text(aes(label=OD_19), vjust=1.6, color="white", #size=3.5)+  theme_minimal()

It looks like the Distrixt with the 2nd smallest population per the 2009 census has the greatest number of DIH charges.

Population and OD_Ttl

ggplot(data=data, aes(x=Population, y=OD)) +
  geom_line(color="purple")+
  geom_point(color="blue")

Well this is odd…This makes me wonder if I miscalcuated the OD statistics somehow? Because we have two very populus districts that have almost no ODs…that doesn’t look right…

##Population & Pros

ggplot(data=data, aes(x=Pros, y=Population)) +
  geom_point(colour="purple")

Just for us to note that prosecutors offices with the most prosecutors are in districts with the largest population (so the two variables are related)