sdutkyData110Project1TheOtherVictims.Rmd should run in your environment
’’ The Washington Post School Shootings
School shootings are horrific events. By 2018 the Post counted 144 children, teachers and others killed in school assaults and 302 physically injured. All present, however, suffer trauma. Using school enrollment, the Post tallied 228,000 children at 234 school incidents.
Fortunately, school shootings are rare events. Many children, however, live in proximity to homicidal violence. This is rarely newsworthy.
Mimicking the Post, I set out to count the homicides occuring within a threshold distance of a school and, using school enrollments, estimating the number of students affected. This involved an API and several datasets:
Combining these datasets discloses, over a period of ten years, nearly two million students enrolled in schools in which a homicide was reported within a mean distance of 142 meters. These schools reported a mean rate of 89% free and reduced price lunches. This indicates that a large majority of these students lived in economically precarious households.
How close in time and space must a child be near to a homicide to become traumatized? However one answers this, I believe that the data here indicate that the potential trauma suffered by the students enrolled at the sites of school shootings might be eclipsed by the nation wide toll of routine violence.
# load libraries
library(tidyverse)
## ── Attaching packages ────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(geosphere)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
param<-list()
param$halfBoxDistance<-174 # meters to sides of bounding box enclosing homicide. Used to find schools inside box
param$homicideUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/homicide-data.csv"
param$schoolCoordUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/public-school-locations-2017-18.csv"
param$schoolNcesUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/nces.csv"
param$schoolNcesGeocodingsUrl="https://github.com/sdutky/mcData110/raw/master/otherVictims/ncesGeocodings.csv"
param$stateBboxUrl="https://gist.githubusercontent.com/a8dx/2340f9527af64f8ef8439366de981168/raw/81d876daea10eab5c2675811c39bcd18a79a9212/US_State_Bounding_Boxes.csv"
homicide<-read_csv(param$homicideUrl)
## Parsed with column specification:
## cols(
## uid = col_character(),
## reported_date = col_double(),
## victim_last = col_character(),
## victim_first = col_character(),
## victim_race = col_character(),
## victim_age = col_character(),
## victim_sex = col_character(),
## city = col_character(),
## state = col_character(),
## lat = col_double(),
## lon = col_double(),
## disposition = col_character()
## )
homicide$victim_age<-as.numeric(homicide$victim_age)
## Warning: NAs introduced by coercion
#str(homicide)
#summary(homicide)
# drop rows where either lat or lon is NA
homicide<-homicide %>%
drop_na(lat,lon)
summary(homicide)
## uid reported_date victim_last
## Length:52119 Min. : 20070101 Length:52119
## Class :character 1st Qu.: 20100318 Class :character
## Mode :character Median : 20121216 Mode :character
## Mean : 20130905
## 3rd Qu.: 20150910
## Max. :201511105
##
## victim_first victim_race victim_age victim_sex
## Length:52119 Length:52119 Min. : 0.0 Length:52119
## Class :character Class :character 1st Qu.: 22.0 Class :character
## Mode :character Mode :character Median : 28.0 Mode :character
## Mean : 31.8
## 3rd Qu.: 40.0
## Max. :102.0
## NA's :2997
## city state lat lon
## Length:52119 Length:52119 Min. :25.73 Min. :-122.51
## Class :character Class :character 1st Qu.:33.77 1st Qu.: -96.00
## Mode :character Mode :character Median :38.52 Median : -87.71
## Mean :37.03 Mean : -91.47
## 3rd Qu.:40.03 3rd Qu.: -81.76
## Max. :45.05 Max. : -71.01
##
## disposition
## Length:52119
## Class :character
## Mode :character
##
##
##
##
#fix error
a<-seq_along(homicide$state)[homicide$state=="wI"]
homicide$state[a]="WI"
print(sort(unique(homicide$state)))
## [1] "AL" "AZ" "CA" "CO" "DC" "FL" "GA" "IL" "IN" "KY" "LA" "MA" "MD" "MI"
## [15] "MN" "MO" "NC" "NE" "NM" "NV" "NY" "OH" "OK" "PA" "TN" "TX" "VA" "WI"
bb<-read_csv(param$stateBboxUrl)
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## STATEFP = col_character(),
## STUSPS = col_character(),
## NAME = col_character(),
## xmin = col_double(),
## ymin = col_double(),
## xmax = col_double(),
## ymax = col_double()
## )
names(bb)[3]<-"state"
bb<-bb[,c("state","xmin","xmax","ymin","ymax")]
hDropped<-nrow(homicide) # get snapshot before
homicide<- homicide %>%
inner_join(bb,by="state")
homicide<- homicide %>%
filter(!(lat<=xmax & lat>=xmin & lon<=ymax & lon>=ymin))
hDropped<-hDropped-nrow(homicide) # and after to compare to before
# All pass filter: no drops
# drop xmin,xmax,ymin,ymax:
homicide<-homicide %>% select(-c(xmin,xmax,ymin,ymax))
bBox<-function(lat,lon,threshold=param$halfBoxDistance) {
# return list of mid-point lats & lons of bounding box
# determine the meter distance of 1 degree at location
dLon<-function(lon,lat) distGeo(c(lon,lat),c(lon+1,lat))
dLat<-function(lon,lat) distGeo(c(lon,lat),c(lon,lat+1))
dlon<-mapply(dLon, lon, lat)
dlat<-mapply(dLat, lon, lat)
eastSide<-lon+threshold/dlon
westSide<-lon-threshold/dlon
northSide<-lat+threshold/dlat
southSide<-lat-threshold/dlat
return(tibble(eastSide=eastSide,westSide=westSide,
southSide=southSide,northSide=northSide))
}
boxes<-bBox(homicide$lat,homicide$lon)
homicide<-cbind(homicide,boxes)
allSchools<-read_csv(param$schoolNcesUrl)
## Parsed with column specification:
## cols(
## .default = col_character(),
## `NCES School ID` = col_double(),
## `NCES District ID` = col_double(),
## ZIP = col_double(),
## `ZIP 4-digit` = col_double()
## )
## See spec(...) for full column specifications.
summary(allSchools) # pre-cleaning
## NCES School ID State School ID NCES District ID
## Min. :1.000e+10 Length:77862 Min. : 100005
## 1st Qu.:1.202e+11 Class :character 1st Qu.:1201500
## Median :2.681e+11 Mode :character Median :2680620
## Mean :2.724e+11 Mean :2723611
## 3rd Qu.:4.013e+11 3rd Qu.:4012615
## Max. :5.517e+11 Max. :5517220
##
## State District ID Low Grade* High Grade*
## Length:77862 Length:77862 Length:77862
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## School Name District County Name*
## Length:77862 Length:77862 Length:77862
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Street Address City State ZIP
## Length:77862 Length:77862 Length:77862 Min. : 1001
## Class :character Class :character Class :character 1st Qu.:32750
## Mode :character Mode :character Mode :character Median :54986
## Mean :54655
## 3rd Qu.:78043
## Max. :97635
##
## ZIP 4-digit Phone Locale Code* Locale*
## Min. : 1 Length:77862 Length:77862 Length:77862
## 1st Qu.:1710 Class :character Class :character Class :character
## Median :3206 Mode :character Mode :character Mode :character
## Mean :3901
## 3rd Qu.:5414
## Max. :9999
## NA's :23865
## Charter Magnet* Title I School*
## Length:77862 Length:77862 Length:77862
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Title 1 School Wide* Students* Teachers*
## Length:77862 Length:77862 Length:77862
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Student Teacher Ratio* Free Lunch* Reduced Lunch*
## Length:77862 Length:77862 Length:77862
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
names(allSchools)<-str_to_title(names(allSchools))
names(allSchools)<-gsub("[[:punct:] ]*","",names(allSchools))
allSchools<-allSchools %>%
mutate(
City=str_to_title(City),
FreeLunch=as.numeric(FreeLunch),
ReducedLunch=as.numeric(ReducedLunch),
Students=as.numeric(Students),
Teachers=as.numeric(Teachers),
StudentTeacherRatio=as.numeric(StudentTeacherRatio),
SubsidizedLunchFraction=(FreeLunch+ReducedLunch)/Students,
NcesSchoolId=as.character(NcesSchoolId)
) %>%
drop_na(Students,Teachers) %>%
filter(Students<5000) %>% # remove 8 cyber academies
filter(Teachers<1000) %>% # Florida Virtual School
filter(Students>=1 & Teachers>=1)
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
allSchools$SubsidizedLunchFraction[allSchools$SubsidizedLunchFraction>1]<-NA
summary(allSchools)
## NcesSchoolId StateSchoolId NcesDistrictId
## Length:70151 Length:70151 Min. : 100005
## Class :character Class :character 1st Qu.:1300840
## Mode :character Mode :character Median :2710060
## Mean :2779878
## 3rd Qu.:4021750
## Max. :5517220
##
## StateDistrictId LowGrade HighGrade
## Length:70151 Length:70151 Length:70151
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## SchoolName District CountyName
## Length:70151 Length:70151 Length:70151
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## StreetAddress City State Zip
## Length:70151 Length:70151 Length:70151 Min. : 1001
## Class :character Class :character Class :character 1st Qu.:32063
## Mode :character Mode :character Mode :character Median :53959
## Mean :53705
## 3rd Qu.:77401
## Max. :97635
##
## Zip4Digit Phone LocaleCode Locale
## Min. : 1 Length:70151 Length:70151 Length:70151
## 1st Qu.:1699 Class :character Class :character Class :character
## Median :3176 Mode :character Mode :character Mode :character
## Mean :3867
## 3rd Qu.:5322
## Max. :9999
## NA's :20995
## Charter Magnet TitleISchool
## Length:70151 Length:70151 Length:70151
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Title1SchoolWide Students Teachers StudentTeacherRatio
## Length:70151 Min. : 1.0 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.: 301.0 1st Qu.: 19.08 1st Qu.: 13.50
## Mode :character Median : 477.0 Median : 29.30 Median : 15.70
## Mean : 568.8 Mean : 34.67 Mean : 16.56
## 3rd Qu.: 698.0 3rd Qu.: 42.97 3rd Qu.: 18.60
## Max. :4841.0 Max. :308.35 Max. :852.00
##
## FreeLunch ReducedLunch SubsidizedLunchFraction
## Min. : 0 Min. : 0.00 Min. :0.000
## 1st Qu.: 93 1st Qu.: 8.00 1st Qu.:0.347
## Median : 201 Median : 25.00 Median :0.560
## Mean : 273 Mean : 36.07 Mean :0.557
## 3rd Qu.: 373 3rd Qu.: 47.00 3rd Qu.:0.785
## Max. :3866 Max. :2928.00 Max. :1.000
## NA's :7180 NA's :8612 NA's :8652
schoolCoord<-read_csv(param$schoolCoordUrl)
## Parsed with column specification:
## cols(
## .default = col_character(),
## ZIP = col_double(),
## LOCALE = col_double(),
## LAT = col_double(),
## LON = col_double(),
## CBSATYPE = col_double()
## )
## See spec(...) for full column specifications.
names(schoolCoord)<-str_to_title(names(schoolCoord))
geoSchools<- schoolCoord[,c("Ncessch","Lat","Lon")] %>%
inner_join(allSchools,by=c("Ncessch"="NcesSchoolId" ))
homicideCityCount<-homicide %>%
group_by(city) %>%
summarise(homicides=n())
schoolsCityCount<- allSchools %>%
group_by(City) %>%
summarise(schools=n())
homicidesJoinedSchoolsByCity<-inner_join(homicideCityCount,schoolsCityCount,by=c("city" ="City"))
print(homicidesJoinedSchoolsByCity,n=300)
## # A tibble: 50 x 3
## city homicides schools
## <chr> <int> <int>
## 1 Albuquerque 375 174
## 2 Atlanta 973 133
## 3 Baltimore 2827 287
## 4 Baton Rouge 424 97
## 5 Birmingham 799 60
## 6 Boston 612 17
## 7 Buffalo 520 94
## 8 Charlotte 687 173
## 9 Chicago 5535 547
## 10 Cincinnati 694 152
## 11 Columbus 1079 291
## 12 Dallas 1567 373
## 13 Denver 312 237
## 14 Detroit 2519 171
## 15 Durham 276 69
## 16 Fort Worth 549 217
## 17 Fresno 483 159
## 18 Houston 2933 681
## 19 Indianapolis 1322 208
## 20 Jacksonville 1168 208
## 21 Kansas City 1190 144
## 22 Las Vegas 1371 268
## 23 Long Beach 378 78
## 24 Los Angeles 2257 399
## 25 Louisville 575 185
## 26 Memphis 1514 207
## 27 Miami 744 265
## 28 Milwaukee 1115 167
## 29 Minneapolis 366 117
## 30 Nashville 757 135
## 31 New Orleans 1434 66
## 32 New York 625 344
## 33 Oakland 947 100
## 34 Oklahoma City 672 145
## 35 Omaha 409 148
## 36 Philadelphia 3036 308
## 37 Phoenix 914 250
## 38 Pittsburgh 631 143
## 39 Richmond 429 183
## 40 Sacramento 374 178
## 41 San Antonio 832 457
## 42 San Bernardino 275 69
## 43 San Diego 459 233
## 44 San Francisco 663 104
## 45 Savannah 246 52
## 46 St. Louis 1672 28
## 47 Stockton 443 104
## 48 Tampa 208 160
## 49 Tulsa 584 104
## 50 Washington 1345 244
getMatchingSchoolsByCity<-function(cityArg,tblHomicides,tblGeoSchools) {
as_tibble( tblHomicides %>%
filter(city==cityArg) %>%
mutate(dummy=TRUE) %>%
inner_join(tblGeoSchools %>%
filter(City==cityArg) %>%
mutate(dummy=TRUE)) %>%
filter(Lat>southSide,Lat<northSide,Lon>westSide,Lon<eastSide) %>%
select(-dummy,-city,-state)
)
}
schoolsNearHomicides<-tibble()
for (city in homicideCityCount$city) {
schoolsNearHomicides<-rbind(schoolsNearHomicides,getMatchingSchoolsByCity(city,homicide,geoSchools))
}
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
## Joining, by = "dummy"
print(
paste(
nrow(schoolsNearHomicides),
"homicides occuring within",
param$halfBoxDistance,
"meters of a school traumatizing",
sum(schoolsNearHomicides$Students),
"students")
)
## [1] "4212 homicides occuring within 174 meters of a school traumatizing 1969084 students"
schoolsNearHomicides <- schoolsNearHomicides %>%
arrange(reported_date)
schoolsNearHomicides$cumHomicides<-cumsum(rep(1,nrow(schoolsNearHomicides)))
schoolsNearHomicides$cumTrauma<-cumsum(schoolsNearHomicides$Students)
f<-function(lat,lon,Lat,Lon) distGeo(c(lon,lat),c(Lon,Lat))
schoolsNearHomicides$distance<-mapply(f,schoolsNearHomicides$lat,schoolsNearHomicides$lon,schoolsNearHomicides$Lat,schoolsNearHomicides$Lon)
schoolsNotNearHomicides<-allSchools %>%
anti_join(schoolsNearHomicides,
by=c("NcesSchoolId"="Ncessch"))
print.data.frame( schoolsNearHomicides %>%
group_by(City) %>%
summarise(exposure=sum(Students),homicides=n()) %>%
arrange(desc(exposure)) %>%
mutate(rate=round(exposure/homicides)) %>%
mutate(exposure=format(exposure,big.mark = ",",scientific = FALSE)),
width=120)
## City exposure homicides rate
## 1 Chicago 432,727 942 459
## 2 Philadelphia 360,567 607 594
## 3 Baltimore 185,097 455 407
## 4 Washington 143,185 396 362
## 5 Houston 100,499 159 632
## 6 Milwaukee 61,449 149 412
## 7 New Orleans 56,894 111 513
## 8 Detroit 51,384 124 414
## 9 Dallas 49,321 97 508
## 10 Miami 43,413 98 443
## 11 New York 39,448 84 470
## 12 Memphis 37,426 85 440
## 13 Las Vegas 36,468 51 715
## 14 Cincinnati 31,166 56 557
## 15 Buffalo 29,439 51 577
## 16 Columbus 26,783 84 319
## 17 Kansas City 26,081 66 395
## 18 Jacksonville 22,322 57 392
## 19 Pittsburgh 20,243 60 337
## 20 Indianapolis 18,476 47 393
## 21 San Antonio 18,301 41 446
## 22 Louisville 17,302 34 509
## 23 Atlanta 15,753 28 563
## 24 Tulsa 15,294 28 546
## 25 Nashville 14,921 32 466
## 26 Minneapolis 14,771 53 279
## 27 Richmond 14,311 29 493
## 28 Omaha 11,766 20 588
## 29 Baton Rouge 11,410 26 439
## 30 Tampa 9,615 14 687
## 31 Oklahoma City 9,330 18 518
## 32 Fort Worth 8,707 14 622
## 33 Boston 8,670 17 510
## 34 Charlotte 8,315 13 640
## 35 Albuquerque 7,426 26 286
## 36 St. Louis 6,045 29 208
## 37 Durham 2,554 7 365
## 38 Savannah 2,205 4 551
g<-ggplot(schoolsNearHomicides, aes(x=ymd(reported_date))) + # basic graphical object
labs(title="cumulative homicides and students exposed")+
#scale_x_continuous(name="date")+
scale_y_continuous(name="counts", trans="log10")+
xlab("date")+
#ylab("count")+
geom_line(aes(y=(cumTrauma)), colour="red",size=1) + # first layer
geom_line(aes(y=(cumHomicides)), colour="green",size=1)+ # second layer
scale_colour_manual("",
labels = c("cT", "cH"),
values = c("red", "green"))
#scale_color_discrete(name = "incidents", labels = c("cumHomicides", "cumTrauma"))
g
title="Distribtution of Distances of Schools from Homicides"
cols=c("Not Near Homicides"="#FF0000","Near Homicides"="#00FF00")
a<-ggplot()+
labs(title=title,x="distance in meters",
y="count")+
geom_density(data=schoolsNearHomicides, aes(x=distance,y=..count..),color='green',show.legend=TRUE) +
geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(distance,na.rm = TRUE)),
linetype = "dashed",color='green', size = 0.8)+
geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(distance,na.rm = TRUE)+sd(distance,na.rm = TRUE)),
linetype = "dashed",color='green', size = 0.4)+
geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(distance,na.rm = TRUE)-sd(distance,na.rm = TRUE)),
linetype = "dashed",color='green', size = 0.4)
a
title="Comparison of Schools Nearby and Distant from Homicides"
cols=c("Not Near Homicides"="#FF0000","Near Homicides"="#00FF00")
a<-ggplot()+
labs(title=title,x="Fraction of Students receiving Subsidized Lunch",
y="count")+
scale_colour_manual(name="Schools",values=cols)+
scale_fill_manual(name="Schools",values=cols) +
geom_density(data=schoolsNearHomicides, aes(x=SubsidizedLunchFraction,y=..count..),color='green',show.legend=TRUE) +
geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)),
linetype = "dashed",color='green', size = 0.8)+
geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)+sd(SubsidizedLunchFraction,na.rm = TRUE)),
linetype = "dashed",color='green', size = 0.4)+
geom_vline(data=schoolsNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)-sd(SubsidizedLunchFraction,na.rm = TRUE)),
linetype = "dashed",color='green', size = 0.4)+
geom_density(data=schoolsNotNearHomicides, aes(x=SubsidizedLunchFraction,y=..count..), color='red',show.legend=TRUE)+
geom_vline(data=schoolsNotNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)),
linetype = "dashed",color='red', size = 0.8)+
geom_vline(data=schoolsNotNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)+sd(SubsidizedLunchFraction,na.rm = TRUE)),
linetype = "dashed",color='red', size = 0.4)+
geom_vline(data=schoolsNotNearHomicides,aes(xintercept = mean(SubsidizedLunchFraction,na.rm = TRUE)-sd(SubsidizedLunchFraction,na.rm = TRUE)),
linetype = "dashed",color='red', size = 0.4)
a
## Warning: Removed 1446 rows containing non-finite values (stat_density).
## Warning: Removed 8110 rows containing non-finite values (stat_density).
t.test(schoolsNearHomicides$SubsidizedLunchFraction,
schoolsNotNearHomicides$SubsidizedLunchFraction)
##
## Welch Two Sample t-test
##
## data: schoolsNearHomicides$SubsidizedLunchFraction and schoolsNotNearHomicides$SubsidizedLunchFraction
## t = 103, df = 3478.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3352634 0.3482748
## sample estimates:
## mean of x mean of y
## 0.8926826 0.5509135
```
##```{r}
library(readxl) setwd(“nces”)
### collect the xls files
ncesList<-system("ls *.xls",intern=TRUE)
nces<-tibble()
#
#get each nces.xls as tibble, bind them to nces tibble
#
for ( i in ncesList) {
t<-read_xls(i,skip=19)
nces<-rbind(nces,t)
}
write_csv(nces,“nces.csv”)
#```
#{r}
ncesRaw<-read_csv(param$schoolNcesUrl)
#normalize column names, removing asterisks and spaces
names(ncesRaw)<-gsub("[ *]*“,”“,names(ncesRaw))
#eliminate rows containing cities not counted
#after first setting city names to titled case
nces<-ncesRaw %>%
mutate(City=str_to_title(City)) %>%
inner_join(homicideCityCount,by=c(”City“=”city“))
#extract NCESSchoolID,StreetAddress,City,State,ZIP
geocode<- nces %>%
select(NCESSchoolID,StreetAddress,City,State,ZIP)
write_csv(geocode,”geocode.csv")
from the command line:
curl –form addressFile=@geocode.csv –form benchmark=9 https://geocoding.geo.census.gov/geocoder/locations/addressbatch –output result.csv
The rows of result.csv where encoding has failed will contain “No_Match” or “Tie”. Successful encodings will have “Match” and an indicator of the encoding was Exact or “Non_Exact”. The processed file was uploaded to GitHub. Its url is stored here in param$schoolNcesGeocodingsUrl