All Datasets, files, and images

sdutkyData110Project1TheOtherVictims.Rmd should run in your environment

’’ The Washington Post School Shootings

School shootings are horrific events. By 2018 the Post counted 144 children, teachers and others killed in school assaults and 302 physically injured. All present, however, suffer trauma. Using school enrollment, the Post tallied 228,000 children at 234 school incidents.

What of the other victims?

Fortunately, school shootings are rare events. Many children, however, live in proximity to homicidal violence. This is rarely newsworthy.

Mimicking the Post, I set out to count the homicides occuring within a threshold distance of a school and, using school enrollments, estimating the number of students affected. This involved an API and several datasets:

  1. The Washington Post’s Homicide Database spans over 52,000 homicides occuring in the 50 largest U.S. cities between January 1, 2007 and December 31, 2015. Each record provides, among other items, the victim’s age and ethnicity, the date the crime was reported and the geocoordinates where the victim was found.
  2. Education Demographic and Geographic Estimates (EDGE) Geocodes provides the geocoordinates and census identifiers of more than 102,000 U.S. public schools and agencies.
  3. National Center for Education Statistics gives state by state each public school’s enrollment, the number of teachers, and the number of students receiving free or reduced price lunches.
  4. The Washington Post’s database of school shootings in the United States since Columbine. provides a variety of data about each shooting as well as the schools at which they occurred.
  5. Association for Learning Environments: " Currently many states follow these site formulas:
    Elementary Schools = 10 acres plus 1 acre for every 100 students;
    Junior High/Middle Schools = 20 acres plus 1 acre for every 100 students;
    Senior High Schools = 30 acres plus 1 acre for every 100 students."
  6. U.S. Census Batch Street Address Geocoding
  7. Bounding Boxes for States used to verify that the geocoordinates of each homicide landed in the states in which they were reported.

Summation

Combining these datasets discloses, over a period of ten years, nearly two million students enrolled in schools in which a homicide was reported within a mean distance of 142 meters. These schools reported a mean rate of 89% free and reduced price lunches. This indicates that a large majority of these students lived in economically precarious households.

How close in time and space must a child be near to a homicide to become traumatized? However one answers this, I believe that the data here indicate that the potential trauma suffered by the students enrolled at the sites of school shootings might be eclipsed by the nation wide toll of routine violence.

# load libraries
library(tidyverse)
## ── Attaching packages ────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(geosphere)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

Parameters:

param<-list()
param$halfBoxDistance<-174 # meters to sides of bounding box enclosing homicide. Used to find schools inside box
param$homicideUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/homicide-data.csv"
param$schoolCoordUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/public-school-locations-2017-18.csv"
param$schoolNcesUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/nces.csv"
param$schoolNcesGeocodingsUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/ncesGeocodings.csv"
param$stateBboxUrl="https://gist.githubusercontent.com/a8dx/2340f9527af64f8ef8439366de981168/raw/81d876daea10eab5c2675811c39bcd18a79a9212/US_State_Bounding_Boxes.csv"

load, view, clean, & prep homicide dataset

homicide<-read_csv(param$homicideUrl)
## Parsed with column specification:
## cols(
##   uid = col_character(),
##   reported_date = col_double(),
##   victim_last = col_character(),
##   victim_first = col_character(),
##   victim_race = col_character(),
##   victim_age = col_character(),
##   victim_sex = col_character(),
##   city = col_character(),
##   state = col_character(),
##   lat = col_double(),
##   lon = col_double(),
##   disposition = col_character()
## )
homicide$victim_age<-as.numeric(homicide$victim_age)
## Warning: NAs introduced by coercion
#str(homicide)
#summary(homicide)

# drop rows where either lat or lon is NA
homicide<-homicide %>%
  drop_na(lat,lon)
         
summary(homicide)
##      uid            reported_date       victim_last       
##  Length:52119       Min.   : 20070101   Length:52119      
##  Class :character   1st Qu.: 20100318   Class :character  
##  Mode  :character   Median : 20121216   Mode  :character  
##                     Mean   : 20130905                     
##                     3rd Qu.: 20150910                     
##                     Max.   :201511105                     
##                                                           
##  victim_first       victim_race          victim_age     victim_sex       
##  Length:52119       Length:52119       Min.   :  0.0   Length:52119      
##  Class :character   Class :character   1st Qu.: 22.0   Class :character  
##  Mode  :character   Mode  :character   Median : 28.0   Mode  :character  
##                                        Mean   : 31.8                     
##                                        3rd Qu.: 40.0                     
##                                        Max.   :102.0                     
##                                        NA's   :2997                      
##      city              state                lat             lon         
##  Length:52119       Length:52119       Min.   :25.73   Min.   :-122.51  
##  Class :character   Class :character   1st Qu.:33.77   1st Qu.: -96.00  
##  Mode  :character   Mode  :character   Median :38.52   Median : -87.71  
##                                        Mean   :37.03   Mean   : -91.47  
##                                        3rd Qu.:40.03   3rd Qu.: -81.76  
##                                        Max.   :45.05   Max.   : -71.01  
##                                                                         
##  disposition       
##  Length:52119      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
#fix error
a<-seq_along(homicide$state)[homicide$state=="wI"]
homicide$state[a]="WI"
print(sort(unique(homicide$state)))
##  [1] "AL" "AZ" "CA" "CO" "DC" "FL" "GA" "IL" "IN" "KY" "LA" "MA" "MD" "MI"
## [15] "MN" "MO" "NC" "NE" "NM" "NV" "NY" "OH" "OK" "PA" "TN" "TX" "VA" "WI"

filter homicide rows where lat,lon lie outside of the bounding boxes of their respective states

bb<-read_csv(param$stateBboxUrl)
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   STATEFP = col_character(),
##   STUSPS = col_character(),
##   NAME = col_character(),
##   xmin = col_double(),
##   ymin = col_double(),
##   xmax = col_double(),
##   ymax = col_double()
## )
names(bb)[3]<-"state"
bb<-bb[,c("state","xmin","xmax","ymin","ymax")]
hDropped<-nrow(homicide) # get snapshot before
homicide<- homicide %>%
  inner_join(bb,by="state") 
homicide<- homicide %>%
  filter(!(lat<=xmax & lat>=xmin & lon<=ymax & lon>=ymin))
hDropped<-hDropped-nrow(homicide) # and after to compare to before
#  All pass filter: no drops

# drop xmin,xmax,ymin,ymax:
homicide<-homicide %>% select(-c(xmin,xmax,ymin,ymax))

0 homicides outside of state borders dropped

Identify all schools located within threshold distance of a homicide

bBox<-function(lat,lon,threshold=param$halfBoxDistance) {
  # return list of mid-point lats & lons of bounding box
  
  # determine the meter distance of 1 degree at location
  dLon<-function(lon,lat) distGeo(c(lon,lat),c(lon+1,lat))
  dLat<-function(lon,lat) distGeo(c(lon,lat),c(lon,lat+1))
  
  dlon<-mapply(dLon, lon, lat)
  dlat<-mapply(dLat, lon, lat)
  
  eastSide<-lon+threshold/dlon
  westSide<-lon-threshold/dlon
  northSide<-lat+threshold/dlat
  southSide<-lat-threshold/dlat
  return(tibble(eastSide=eastSide,westSide=westSide,
                southSide=southSide,northSide=northSide))
}

generate the bonding boxes enclosing each homicide and bind it to the homicide tibble

boxes<-bBox(homicide$lat,homicide$lon)
homicide<-cbind(homicide,boxes)

fetch and clean the NCESS dataset of all schools

allSchools<-read_csv(param$schoolNcesUrl)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `NCES School ID` = col_double(),
##   `NCES District ID` = col_double(),
##   ZIP = col_double(),
##   `ZIP 4-digit` = col_double()
## )
## See spec(...) for full column specifications.
summary(allSchools) # pre-cleaning
##  NCES School ID      State School ID    NCES District ID 
##  Min.   :1.000e+10   Length:77862       Min.   : 100005  
##  1st Qu.:1.202e+11   Class :character   1st Qu.:1201500  
##  Median :2.681e+11   Mode  :character   Median :2680620  
##  Mean   :2.724e+11                      Mean   :2723611  
##  3rd Qu.:4.013e+11                      3rd Qu.:4012615  
##  Max.   :5.517e+11                      Max.   :5517220  
##                                                          
##  State District ID   Low Grade*        High Grade*       
##  Length:77862       Length:77862       Length:77862      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  School Name          District         County Name*      
##  Length:77862       Length:77862       Length:77862      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Street Address         City              State                ZIP       
##  Length:77862       Length:77862       Length:77862       Min.   : 1001  
##  Class :character   Class :character   Class :character   1st Qu.:32750  
##  Mode  :character   Mode  :character   Mode  :character   Median :54986  
##                                                           Mean   :54655  
##                                                           3rd Qu.:78043  
##                                                           Max.   :97635  
##                                                                          
##   ZIP 4-digit       Phone           Locale Code*         Locale*         
##  Min.   :   1    Length:77862       Length:77862       Length:77862      
##  1st Qu.:1710    Class :character   Class :character   Class :character  
##  Median :3206    Mode  :character   Mode  :character   Mode  :character  
##  Mean   :3901                                                            
##  3rd Qu.:5414                                                            
##  Max.   :9999                                                            
##  NA's   :23865                                                           
##    Charter            Magnet*          Title I School*   
##  Length:77862       Length:77862       Length:77862      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Title 1 School Wide*  Students*          Teachers*        
##  Length:77862         Length:77862       Length:77862      
##  Class :character     Class :character   Class :character  
##  Mode  :character     Mode  :character   Mode  :character  
##                                                            
##                                                            
##                                                            
##                                                            
##  Student Teacher Ratio* Free Lunch*        Reduced Lunch*    
##  Length:77862           Length:77862       Length:77862      
##  Class :character       Class :character   Class :character  
##  Mode  :character       Mode  :character   Mode  :character  
##                                                              
##                                                              
##                                                              
## 
names(allSchools)<-str_to_title(names(allSchools))
names(allSchools)<-gsub("[[:punct:] ]*","",names(allSchools))

allSchools<-allSchools %>%
      mutate(
        City=str_to_title(City),
        FreeLunch=as.numeric(FreeLunch),
        ReducedLunch=as.numeric(ReducedLunch),
        Students=as.numeric(Students),
        Teachers=as.numeric(Teachers),
        StudentTeacherRatio=as.numeric(StudentTeacherRatio),
        SubsidizedLunchFraction=(FreeLunch+ReducedLunch)/Students,
        NcesSchoolId=as.character(NcesSchoolId)
                          ) %>%
     drop_na(Students,Teachers) %>%
     filter(Students<5000) %>% # remove 8 cyber academies
     filter(Teachers<1000) %>% # Florida Virtual School
     filter(Students>=1 & Teachers>=1) 
## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion
allSchools$SubsidizedLunchFraction[allSchools$SubsidizedLunchFraction>1]<-NA

summary(allSchools)
##  NcesSchoolId       StateSchoolId      NcesDistrictId   
##  Length:70151       Length:70151       Min.   : 100005  
##  Class :character   Class :character   1st Qu.:1300840  
##  Mode  :character   Mode  :character   Median :2710060  
##                                        Mean   :2779878  
##                                        3rd Qu.:4021750  
##                                        Max.   :5517220  
##                                                         
##  StateDistrictId      LowGrade          HighGrade        
##  Length:70151       Length:70151       Length:70151      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   SchoolName          District          CountyName       
##  Length:70151       Length:70151       Length:70151      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  StreetAddress          City              State                Zip       
##  Length:70151       Length:70151       Length:70151       Min.   : 1001  
##  Class :character   Class :character   Class :character   1st Qu.:32063  
##  Mode  :character   Mode  :character   Mode  :character   Median :53959  
##                                                           Mean   :53705  
##                                                           3rd Qu.:77401  
##                                                           Max.   :97635  
##                                                                          
##    Zip4Digit        Phone            LocaleCode           Locale         
##  Min.   :   1    Length:70151       Length:70151       Length:70151      
##  1st Qu.:1699    Class :character   Class :character   Class :character  
##  Median :3176    Mode  :character   Mode  :character   Mode  :character  
##  Mean   :3867                                                            
##  3rd Qu.:5322                                                            
##  Max.   :9999                                                            
##  NA's   :20995                                                           
##    Charter             Magnet          TitleISchool      
##  Length:70151       Length:70151       Length:70151      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Title1SchoolWide      Students         Teachers      StudentTeacherRatio
##  Length:70151       Min.   :   1.0   Min.   :  1.00   Min.   :  0.00     
##  Class :character   1st Qu.: 301.0   1st Qu.: 19.08   1st Qu.: 13.50     
##  Mode  :character   Median : 477.0   Median : 29.30   Median : 15.70     
##                     Mean   : 568.8   Mean   : 34.67   Mean   : 16.56     
##                     3rd Qu.: 698.0   3rd Qu.: 42.97   3rd Qu.: 18.60     
##                     Max.   :4841.0   Max.   :308.35   Max.   :852.00     
##                                                                          
##    FreeLunch     ReducedLunch     SubsidizedLunchFraction
##  Min.   :   0   Min.   :   0.00   Min.   :0.000          
##  1st Qu.:  93   1st Qu.:   8.00   1st Qu.:0.347          
##  Median : 201   Median :  25.00   Median :0.560          
##  Mean   : 273   Mean   :  36.07   Mean   :0.557          
##  3rd Qu.: 373   3rd Qu.:  47.00   3rd Qu.:0.785          
##  Max.   :3866   Max.   :2928.00   Max.   :1.000          
##  NA's   :7180   NA's   :8612      NA's   :8652

join up allSchools with schoolEncodings by NcesSchoolId to merge geolocations with school details

schoolCoord<-read_csv(param$schoolCoordUrl)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   ZIP = col_double(),
##   LOCALE = col_double(),
##   LAT = col_double(),
##   LON = col_double(),
##   CBSATYPE = col_double()
## )
## See spec(...) for full column specifications.
names(schoolCoord)<-str_to_title(names(schoolCoord))

geoSchools<- schoolCoord[,c("Ncessch","Lat","Lon")] %>%
      inner_join(allSchools,by=c("Ncessch"="NcesSchoolId"  ))

look at the distribution of homicides by city and the distribution of schools in cities with homicides

homicideCityCount<-homicide %>%
  group_by(city) %>%
  summarise(homicides=n())

schoolsCityCount<- allSchools %>%
      group_by(City) %>%
      summarise(schools=n())

homicidesJoinedSchoolsByCity<-inner_join(homicideCityCount,schoolsCityCount,by=c("city" ="City"))

print(homicidesJoinedSchoolsByCity,n=300)
## # A tibble: 50 x 3
##    city           homicides schools
##    <chr>              <int>   <int>
##  1 Albuquerque          375     174
##  2 Atlanta              973     133
##  3 Baltimore           2827     287
##  4 Baton Rouge          424      97
##  5 Birmingham           799      60
##  6 Boston               612      17
##  7 Buffalo              520      94
##  8 Charlotte            687     173
##  9 Chicago             5535     547
## 10 Cincinnati           694     152
## 11 Columbus            1079     291
## 12 Dallas              1567     373
## 13 Denver               312     237
## 14 Detroit             2519     171
## 15 Durham               276      69
## 16 Fort Worth           549     217
## 17 Fresno               483     159
## 18 Houston             2933     681
## 19 Indianapolis        1322     208
## 20 Jacksonville        1168     208
## 21 Kansas City         1190     144
## 22 Las Vegas           1371     268
## 23 Long Beach           378      78
## 24 Los Angeles         2257     399
## 25 Louisville           575     185
## 26 Memphis             1514     207
## 27 Miami                744     265
## 28 Milwaukee           1115     167
## 29 Minneapolis          366     117
## 30 Nashville            757     135
## 31 New Orleans         1434      66
## 32 New York             625     344
## 33 Oakland              947     100
## 34 Oklahoma City        672     145
## 35 Omaha                409     148
## 36 Philadelphia        3036     308
## 37 Phoenix              914     250
## 38 Pittsburgh           631     143
## 39 Richmond             429     183
## 40 Sacramento           374     178
## 41 San Antonio          832     457
## 42 San Bernardino       275      69
## 43 San Diego            459     233
## 44 San Francisco        663     104
## 45 Savannah             246      52
## 46 St. Louis           1672      28
## 47 Stockton             443     104
## 48 Tampa                208     160
## 49 Tulsa                584     104
## 50 Washington          1345     244

getMatchingSchoolsByCity<-function(cityArg,tblHomicides,tblGeoSchools) {
  as_tibble( tblHomicides %>% 
    filter(city==cityArg) %>%
    mutate(dummy=TRUE) %>%
    inner_join(tblGeoSchools %>% 
                 filter(City==cityArg) %>%
                 mutate(dummy=TRUE)) %>%
    filter(Lat>southSide,Lat<northSide,Lon>westSide,Lon<eastSide) %>%
    select(-dummy,-city,-state)
  )
} 

Collect the schools that fit into the bounding box of each homicide

schoolsNearHomicides<-tibble()
for (city in homicideCityCount$city) {
  schoolsNearHomicides<-rbind(schoolsNearHomicides,getMatchingSchoolsByCity(city,homicide,geoSchools))
}
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
print(
  paste(
    nrow(schoolsNearHomicides),
    "homicides occuring within",
    param$halfBoxDistance,
    "meters of a school traumatizing",
    sum(schoolsNearHomicides$Students),
    "students")
)
## [1] "4212 homicides occuring within 174 meters of a school traumatizing 1969084 students"

Calculate distance to school, arrange by date, insert cummulative sums of homicides and students affected

schoolsNearHomicides <- schoolsNearHomicides %>%
  arrange(reported_date) 
  
schoolsNearHomicides$cumHomicides<-cumsum(rep(1,nrow(schoolsNearHomicides)))
schoolsNearHomicides$cumTrauma<-cumsum(schoolsNearHomicides$Students)

f<-function(lat,lon,Lat,Lon) distGeo(c(lon,lat),c(Lon,Lat))

schoolsNearHomicides$distance<-mapply(f,schoolsNearHomicides$lat,schoolsNearHomicides$lon,schoolsNearHomicides$Lat,schoolsNearHomicides$Lon)

Partition allSchools into those not near a homicide

schoolsNotNearHomicides<-allSchools %>%
      anti_join(schoolsNearHomicides,
                by=c("NcesSchoolId"="Ncessch"))

Between 2007-01-01 and 2017-12-29, 1,969,084 students exposed to 4212 homicides occuring within a mean of 142 meters of a school.

print.data.frame( schoolsNearHomicides %>%
  group_by(City) %>%
  summarise(exposure=sum(Students),homicides=n()) %>%
  arrange(desc(exposure)) %>%
  mutate(rate=round(exposure/homicides)) %>%
  mutate(exposure=format(exposure,big.mark = ",",scientific = FALSE)),
  width=120)
##             City exposure homicides rate
## 1        Chicago  432,727       942  459
## 2   Philadelphia  360,567       607  594
## 3      Baltimore  185,097       455  407
## 4     Washington  143,185       396  362
## 5        Houston  100,499       159  632
## 6      Milwaukee   61,449       149  412
## 7    New Orleans   56,894       111  513
## 8        Detroit   51,384       124  414
## 9         Dallas   49,321        97  508
## 10         Miami   43,413        98  443
## 11      New York   39,448        84  470
## 12       Memphis   37,426        85  440
## 13     Las Vegas   36,468        51  715
## 14    Cincinnati   31,166        56  557
## 15       Buffalo   29,439        51  577
## 16      Columbus   26,783        84  319
## 17   Kansas City   26,081        66  395
## 18  Jacksonville   22,322        57  392
## 19    Pittsburgh   20,243        60  337
## 20  Indianapolis   18,476        47  393
## 21   San Antonio   18,301        41  446
## 22    Louisville   17,302        34  509
## 23       Atlanta   15,753        28  563
## 24         Tulsa   15,294        28  546
## 25     Nashville   14,921        32  466
## 26   Minneapolis   14,771        53  279
## 27      Richmond   14,311        29  493
## 28         Omaha   11,766        20  588
## 29   Baton Rouge   11,410        26  439
## 30         Tampa    9,615        14  687
## 31 Oklahoma City    9,330        18  518
## 32    Fort Worth    8,707        14  622
## 33        Boston    8,670        17  510
## 34     Charlotte    8,315        13  640
## 35   Albuquerque    7,426        26  286
## 36     St. Louis    6,045        29  208
## 37        Durham    2,554         7  365
## 38      Savannah    2,205         4  551

Plots

g<-ggplot(schoolsNearHomicides, aes(x=ymd(reported_date))) +                    # basic graphical object
  labs(title="cumulative homicides and students exposed")+
  #scale_x_continuous(name="date")+
  scale_y_continuous(name="counts", trans="log10")+
   xlab("date")+
  #ylab("count")+
  geom_line(aes(y=(cumTrauma)), colour="red",size=1) +  # first layer
  geom_line(aes(y=(cumHomicides)), colour="green",size=1)+  # second layer
  scale_colour_manual("", 
                      labels = c("cT", "cH"),
                      values = c("red", "green")) 
  #scale_color_discrete(name = "incidents", labels = c("cumHomicides", "cumTrauma"))
g

title="Distribtution of Distances of Schools from Homicides"

cols=c("Not Near Homicides"="#FF0000","Near Homicides"="#00FF00")
a<-ggplot()+
  labs(title=title,x="distance in meters",
      y="count")+
geom_density(data=schoolsNearHomicides, aes(x=distance,y=..count..),color='green',show.legend=TRUE) +
  geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(distance,na.rm = TRUE)),
             linetype = "dashed",color='green', size = 0.8)+
  geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(distance,na.rm = TRUE)+sd(distance,na.rm = TRUE)),
             linetype = "dashed",color='green', size = 0.4)+
  geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(distance,na.rm = TRUE)-sd(distance,na.rm = TRUE)),
             linetype = "dashed",color='green', size = 0.4)
a

title="Comparison of Schools Nearby and Distant from Homicides"

cols=c("Not Near Homicides"="#FF0000","Near Homicides"="#00FF00")
a<-ggplot()+
  labs(title=title,x="Fraction of Students receiving Subsidized Lunch",
      y="count")+
  scale_colour_manual(name="Schools",values=cols)+
  scale_fill_manual(name="Schools",values=cols) +
geom_density(data=schoolsNearHomicides, aes(x=SubsidizedLunchFraction,y=..count..),color='green',show.legend=TRUE) +
  geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)),
             linetype = "dashed",color='green', size = 0.8)+
  geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)+sd(SubsidizedLunchFraction,na.rm = TRUE)),
             linetype = "dashed",color='green', size = 0.4)+
  geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)-sd(SubsidizedLunchFraction,na.rm = TRUE)),
             linetype = "dashed",color='green', size = 0.4)+
geom_density(data=schoolsNotNearHomicides, aes(x=SubsidizedLunchFraction,y=..count..), color='red',show.legend=TRUE)+
  geom_vline(data=schoolsNotNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)), 
             linetype = "dashed",color='red', size = 0.8)+
  geom_vline(data=schoolsNotNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)+sd(SubsidizedLunchFraction,na.rm = TRUE)),
             linetype = "dashed",color='red', size = 0.4)+
  geom_vline(data=schoolsNotNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)-sd(SubsidizedLunchFraction,na.rm = TRUE)),
             linetype = "dashed",color='red', size = 0.4)
a
## Warning: Removed 1446 rows containing non-finite values (stat_density).
## Warning: Removed 8110 rows containing non-finite values (stat_density).

Apply t-test to determine significant difference in means for Subsidized Lunches between schools nearby and distant from homicides.

t.test(schoolsNearHomicides$SubsidizedLunchFraction,
       schoolsNotNearHomicides$SubsidizedLunchFraction)
## 
##  Welch Two Sample t-test
## 
## data:  schoolsNearHomicides$SubsidizedLunchFraction and schoolsNotNearHomicides$SubsidizedLunchFraction
## t = 103, df = 3478.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3352634 0.3482748
## sample estimates:
## mean of x mean of y 
## 0.8926826 0.5509135

```

Dataset preparation for command line processing:

Merge schoolNceso state xls datasets into combined tibble

##```{r}
library(readxl) setwd(“nces”)
### collect the xls files
ncesList<-system("ls *.xls",intern=TRUE)
nces<-tibble()
#
#get each nces.xls as tibble, bind them to nces tibble
#
for ( i in ncesList) {
t<-read_xls(i,skip=19)
nces<-rbind(nces,t)
}

write_csv(nces,“nces.csv”)

#```

Extract matching homicide cities from nces for geocoding:

#{r}
ncesRaw<-read_csv(param$schoolNcesUrl)
#normalize column names, removing asterisks and spaces
names(ncesRaw)<-gsub("[ *]*“,”“,names(ncesRaw))
#eliminate rows containing cities not counted
#after first setting city names to titled case
nces<-ncesRaw %>%
  mutate(City=str_to_title(City)) %>%
  inner_join(homicideCityCount,by=c(”City“=”city“))
#extract NCESSchoolID,StreetAddress,City,State,ZIP
geocode<- nces %>%
select(NCESSchoolID,StreetAddress,City,State,ZIP)
write_csv(geocode,”geocode.csv")

NB: geocode.csv gets encoded using the US Census Bureau API

from the command line:
curl –form –form benchmark=9 https://geocoding.geo.census.gov/geocoder/locations/addressbatch –output result.csv
The rows of result.csv where encoding has failed will contain “No_Match” or “Tie”. Successful encodings will have “Match” and an indicator of the encoding was Exact or “Non_Exact”. The processed file was uploaded to GitHub. Its url is stored here in param$schoolNcesGeocodingsUrl