So, you have some data… perhaps you want to look at how your patients are responding to an exercise intervention for obesity and see if there’s a difference in compliance and weight loss success that could be related to violence near home that make it unsafe to play outside.
The City of Philadelphia supplies information about shootings (including officer-involved shootings) which includes data about the shooting victim and the location. Here, we’re really interested in the location of shootings over the past few years, to understand what parts of Philadelphia are more prone to this specific kind of violence.
To see more information about this dataset, please visit https://www.opendataphilly.org/dataset/shooting-victims/resource/a6240077-cbc7-46fb-b554-39417be606ee?inner_span=True.
For our purposes, we’re going to get the bare minimum of information: latitude and longitude. The API endpoint is described in the link above and uses a SQL query to select only the data we care about. Because our query has spaces and other special characters, we need to “encode” it for request.
The data will come in as json, which we’ll parse.
library(jsonlite)
url <- URLencode('https://www.opendataphilly.org/api/action/datastore_search_sql?sql=SELECT lat, lng from "a6240077-cbc7-46fb-b554-39417be606ee"')
shooting_data <- fromJSON(url)
We can examine the shooting data by using R’s str (structure) command:
str(shooting_data)
## List of 3
## $ help : chr "https://www.opendataphilly.org/api/3/action/help_show?name=datastore_search_sql"
## $ success: logi TRUE
## $ result :List of 3
## ..$ records:'data.frame': 2500 obs. of 2 variables:
## .. ..$ lat: chr [1:2500] "39.922968633" "40.0296637640001" "40.037804839" "39.9682945860001" ...
## .. ..$ lng: chr [1:2500] "-75.18870837" "-75.1205859089999" "-75.139286339" "-75.223324033" ...
## ..$ fields :'data.frame': 2 obs. of 2 variables:
## .. ..$ type: chr [1:2] "numeric" "numeric"
## .. ..$ id : chr [1:2] "lat" "lng"
## ..$ sql : chr "SELECT lat, lng from \"a6240077-cbc7-46fb-b554-39417be606ee\""
Here we see that we have a data frame, accessible at shooting_data$result$records:
head(shooting_data$result$records, 6)
## lat lng
## 1 39.922968633 -75.18870837
## 2 40.0296637640001 -75.1205859089999
## 3 40.037804839 -75.139286339
## 4 39.9682945860001 -75.223324033
## 5 39.995426509 -75.126076954
## 6 40.0429494850001 -75.1617552229999
If we wanted to, we could easily create a map of these shootings, just based on latitude and longitude. Since latitude and longitude are currently in “chr” (character) format, we’ll make them numeric so that we can do math on them. We’ll create a map that’s centered on the mean latitude and longitude of all our shootings, and which is appropriately zoomed in (you might have to experiment with the zoom factor).
We’re going to add a standard road map below to show more context, using addTiles.
library(leaflet)
library(leaflet.extras)
library(dplyr)
shootings <- shooting_data$result$records
shootings$lat <- as.numeric(shootings$lat)
shootings$lng <- as.numeric(shootings$lng)
shootings %>%
leaflet() %>%
addTiles() %>%
setView(lng = mean(as.numeric(shootings$lng), na.rm=TRUE),
lat = mean(as.numeric(shootings$lat), na.rm=TRUE), zoom = 10) %>%
addMarkers(clusterOptions = markerClusterOptions()) %>%
suspendScroll()
What’s more likely, however, is that we want to use polygon data to create a map that shows how much a particular area is affected. This is because we want to create combined data – we want to put information about our patients or research subjects along with the level of violence they are exposed to. Instead of using latitude and longitude, we’ll gather the number of shootings per Census tract, which we can then use as a proxy for violence exposure for the patients and subjects who live in that Census tract. It’s a sort of “binning”, but using the existing “bins” of Census tracts.
Let’s start by getting a map:
library(rgdal)
philadelphiaCensusTracts <- readOGR("http://data.phl.opendata.arcgis.com/datasets/8bc0786524a4486bb3cf0f9862ad0fbf_0.geojson")
## OGR data source with driver: GeoJSON
## Source: "http://data.phl.opendata.arcgis.com/datasets/8bc0786524a4486bb3cf0f9862ad0fbf_0.geojson", layer: "OGRGeoJSON"
## with 384 features
## It has 14 fields
Now what we’d like to do is get the shootings-per-tract data, which we can then combine with our research or clinical data to see if violence near home has any effect on our outcomes. To do this, we take the latitude and longitude of our shootings and transform them slightly so that they are understood as spatial coordinates, not just pairs of numbers. We’ll use the same map projection used in our original philadelphiaCensusTracts.
library(sp)
coordinates <- SpatialPoints(shootings[c("lng", "lat")])
proj4string(coordinates) <- proj4string(philadelphiaCensusTracts)
Let’s now apply what we know about our polygons (from philadelphiaCensusTracts) and apply that to our points. We’ll end up with a table that has one row for each shooting coordinate. Essentially, what we’re doing is taking each point, lining it up with a matching polygon, and then getting the data about that polygon, which came along with the geoJSON file we downloaded.
shooting_tract_data <- over(coordinates, philadelphiaCensusTracts)
head(shooting_tract_data)
## OBJECTID STATEFP10 COUNTYFP10 TRACTCE10 GEOID10 NAME10
## 1 198 42 101 003600 42101003600 36
## 2 112 42 101 029000 42101029000 290
## 3 356 42 101 028200 42101028200 282
## 4 57 42 101 010300 42101010300 103
## 5 141 42 101 017602 42101017602 176.02
## 6 98 42 101 024700 42101024700 247
## NAMELSAD10 MTFCC10 FUNCSTAT10 ALAND10 AWATER10 INTPTLAT10
## 1 Census Tract 36 G5020 S 964539 0 +39.9279255
## 2 Census Tract 290 G5020 S 818133 20001 +40.0296096
## 3 Census Tract 282 G5020 S 855812 0 +40.0348704
## 4 Census Tract 103 G5020 S 281357 0 +39.9678322
## 5 Census Tract 176.02 G5020 S 400458 0 +39.9967182
## 6 Census Tract 247 G5020 S 941266 0 +40.0417034
## INTPTLON10 LOGRECNO
## 1 -075.1920206 10377
## 2 -075.1166058 10605
## 3 -075.1403327 10596
## 4 -075.2254383 10437
## 5 -075.1305528 10507
## 6 -075.1645311 10560
We see the first few lines of the Census data for each of our shootings. For example, the first shooting in our shooting data corresponds to Census tract 36, which is in State 42 (Pennsylvania) and County 101 (Philadelphia County). We can use this to find out how many shootings take place in each Census tract.
shootings_by_census_tract <- shooting_tract_data %>%
group_by(GEOID10) %>%
summarise(num_shootings = n()) %>%
ungroup()
head(shootings_by_census_tract)
## # A tibble: 6 x 2
## GEOID10 num_shootings
## <fct> <int>
## 1 42101000100 2
## 2 42101000200 1
## 3 42101000300 1
## 4 42101000600 2
## 5 42101000700 1
## 6 42101000804 1
Don’t forget that there are some Census tracts that aren’t represented at all in our shooting_tract_data data frame, so let’s make sure we enrich it with all the tracts that aren’t included in the shooting data. We can get those by taking the data frame of our tract data, selecting the list of all the Census tracts in Philadelphia, and making sure that if they weren’t mentioned above, we add them, but with num_shootings equal to 0.
non_shooting_tracts <- philadelphiaCensusTracts@data %>%
select(GEOID10) %>%
filter(!GEOID10 %in% shootings_by_census_tract$GEOID10) %>%
mutate(num_shootings = 0)
head(non_shooting_tracts)
## GEOID10 num_shootings
## 1 42101014300 0
## 2 42101020600 0
## 3 42101001400 0
## 4 42101001900 0
## 5 42101002900 0
## 6 42101005000 0
We can now combine the tracts-with-shootings and the tracts-with-no-shootings to get an overall picture of violence by census tract:
shootings_by_census_tract <- rbind(shootings_by_census_tract, non_shooting_tracts)
Let’s map this! We need to combine the data we aggregated with the data in the map file.
WARNING: PITFALL AHEAD!
We have a really simple merge, right? Census tract information is contained in both datasets, so we have a “hinge”:
merged_data <- merge (philadelphiaCensusTracts@data, shootings_by_census_tract, by = "GEOID10")
Let’s peek!
head(merged_data)
## GEOID10 OBJECTID STATEFP10 COUNTYFP10 TRACTCE10 NAME10
## 1 42101000100 190 42 101 000100 1
## 2 42101000200 191 42 101 000200 2
## 3 42101000300 192 42 101 000300 3
## 4 42101000401 145 42 101 000401 4.01
## 5 42101000402 144 42 101 000402 4.02
## 6 42101000500 193 42 101 000500 5
## NAMELSAD10 MTFCC10 FUNCSTAT10 ALAND10 AWATER10 INTPTLAT10
## 1 Census Tract 1 G5020 S 704909 0 +39.9523827
## 2 Census Tract 2 G5020 S 382478 0 +39.9553999
## 3 Census Tract 3 G5020 S 548342 356 +39.9568780
## 4 Census Tract 4.01 G5020 S 214766 14981 +39.9541871
## 5 Census Tract 4.02 G5020 S 303680 0 +39.9532973
## 6 Census Tract 5 G5020 S 428783 0 +39.9519534
## INTPTLON10 LOGRECNO num_shootings
## 1 -075.1466628 10335 2
## 2 -075.1569775 10336 1
## 3 -075.1716655 10337 1
## 4 -075.1758082 10338 0
## 5 -075.1686952 10339 0
## 6 -075.1581776 10340 0
Looks great, and as merge is wont to do, it’s ordered the new data frame by the merge variable, GEOID10.
Can we just put that back into the map?
No! While maintaining the order of rows is not something we’re used to doing in R, we absolutely have to maintain the original order of rows when we’re using spatial polygon data. That’s because the rows of the data frame correspond 1:1, in order, to the polygons.
There are a few ways to work around this problem, but we have a great, easy to use solution. Let’s take a peek at our original order:
head(philadelphiaCensusTracts@data)
## OBJECTID STATEFP10 COUNTYFP10 TRACTCE10 GEOID10 NAME10
## 0 1 42 101 009400 42101009400 94
## 1 2 42 101 009500 42101009500 95
## 2 3 42 101 009600 42101009600 96
## 3 4 42 101 013800 42101013800 138
## 4 5 42 101 013900 42101013900 139
## 5 6 42 101 014000 42101014000 140
## NAMELSAD10 MTFCC10 FUNCSTAT10 ALAND10 AWATER10 INTPTLAT10
## 0 Census Tract 94 G5020 S 366717 0 +39.9632709
## 1 Census Tract 95 G5020 S 319070 0 +39.9658709
## 2 Census Tract 96 G5020 S 405273 0 +39.9655396
## 3 Census Tract 138 G5020 S 341256 0 +39.9764504
## 4 Census Tract 139 G5020 S 562934 0 +39.9750563
## 5 Census Tract 140 G5020 S 439802 0 +39.9735358
## INTPTLON10 LOGRECNO
## 0 -075.2322437 10429
## 1 -075.2379140 10430
## 2 -075.2435075 10431
## 3 -075.1771771 10468
## 4 -075.1711846 10469
## 5 -075.1630966 10470
Yes, there’s a great “OBJECTID” column that’s numbered incrementally. So it’s very simple for us to take our merged dataframe and re-instate that order:
merged_data <- merged_data[order(merged_data$OBJECTID),]
head(merged_data)
## GEOID10 OBJECTID STATEFP10 COUNTYFP10 TRACTCE10 NAME10
## 95 42101009400 1 42 101 009400 94
## 96 42101009500 2 42 101 009500 95
## 97 42101009600 3 42 101 009600 96
## 134 42101013800 4 42 101 013800 138
## 135 42101013900 5 42 101 013900 139
## 136 42101014000 6 42 101 014000 140
## NAMELSAD10 MTFCC10 FUNCSTAT10 ALAND10 AWATER10 INTPTLAT10
## 95 Census Tract 94 G5020 S 366717 0 +39.9632709
## 96 Census Tract 95 G5020 S 319070 0 +39.9658709
## 97 Census Tract 96 G5020 S 405273 0 +39.9655396
## 134 Census Tract 138 G5020 S 341256 0 +39.9764504
## 135 Census Tract 139 G5020 S 562934 0 +39.9750563
## 136 Census Tract 140 G5020 S 439802 0 +39.9735358
## INTPTLON10 LOGRECNO num_shootings
## 95 -075.2322437 10429 15
## 96 -075.2379140 10430 13
## 97 -075.2435075 10431 10
## 134 -075.1771771 10468 10
## 135 -075.1711846 10469 7
## 136 -075.1630966 10470 9
Now, this data frame we can add back into the spatial polygon data frame safely:
philadelphiaCensusTracts@data <- merged_data
And then mapping is a snap:
shooting_palette <- colorBin("Reds", domain = philadelphiaCensusTracts@data$num_shootings, bins = 5, na.color = "#808080")
leaflet() %>%
addPolygons(
data = philadelphiaCensusTracts,
fillColor = ~shooting_palette(philadelphiaCensusTracts@data$num_shootings),
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "white", # border color
fillOpacity = 1,
label = paste("Number of Shootings: ", philadelphiaCensusTracts@data$num_shootings, sep = "")) %>%
suspendScroll()
We could also plot the actual shootings on top – that might be duplicative effort but it’s also helpful to make sure there’s nothing obviously wrong in the mapping.
leaflet() %>%
addPolygons(
data = philadelphiaCensusTracts,
fillColor = ~shooting_palette(philadelphiaCensusTracts@data$num_shootings),
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "white", # border color
fillOpacity = 1,
label = paste("Number of Shootings: ", philadelphiaCensusTracts@data$num_shootings, sep = "")) %>%
addCircles(lat = shootings$lat, lng = shootings$lng, color = "dark grey", weight = 1, radius = 100) %>%
suspendScroll()
So what would that map look like if we hadn’t handled the row ordering problem? Let’s compare:
badPhilly <- philadelphiaCensusTracts
badPhilly@data <- merged_data[order(merged_data$GEOID10),]
leaflet() %>%
addPolygons(
data = badPhilly,
fillColor = ~shooting_palette(badPhilly@data$num_shootings),
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "white", # border color
fillOpacity = 1,
label = paste("Number of Shootings: ",
badPhilly@data$num_shootings, sep = "")) %>%
addCircles(lat = shootings$lat, lng = shootings$lng, color = "dark grey", weight = 1, radius = 100) %>%
suspendScroll()
You can see that the extra diagnostic layer was helpful to show that the shooting data points don’t line up with the polygons.
Important aside: this data is a work of fiction, so:
Any resemblance to actual persons, living or dead, or actual events is purely coincidental.
Let’s take a peek at our fake data, which purports to give the number of minutes various patients on an obesity-related research project reported exercising.
fake_exercise_data <- read.csv("../Data/fake_exercise_data.csv", stringsAsFactors = FALSE)
head(fake_exercise_data)
## mrn census_tract daily_exercise_minutes
## 1 3755308 274.02 17
## 2 8144402 279.01 25
## 3 4819425 247.00 32
## 4 8974594 141.00 32
## 5 9478166 279.01 73
## 6 1633889 177.01 31
exercise_per_tract <- fake_exercise_data %>%
group_by(census_tract) %>%
summarise(mean_exercise = mean(daily_exercise_minutes)) %>%
ungroup()
head(exercise_per_tract)
## # A tibble: 6 x 2
## census_tract mean_exercise
## <dbl> <dbl>
## 1 1 27
## 2 2 93
## 3 3 87
## 4 4.01 74
## 5 6 61
## 6 7 49.5
WARNING WARNING PITFALL AHEAD!
First, we’ll combine exercise_per_tract with the data found in philadelphiaCensusTracts@data. This time, we’re combining the short name, not the fully qualified GEOID (who knows why?!).
census_tracts <- merge(x=philadelphiaCensusTracts@data, y=exercise_per_tract, by.x="NAME10", by.y="census_tract", all.x = TRUE)
Then we’ll add our enriched data back to the geojson data, so that in addition to the fields it came with, it will now contain the exercise and shooting data we gathered. It’s important to order this data by the OBJECTID so that the correct polygon is associated with the correct data!
philadelphiaCensusTracts@data <- census_tracts[order(census_tracts$OBJECTID),]
Now, let’s create an interactive map! We’ll color the polygons by exercise amount to begin with.
exercise_palette <- colorBin("Blues", domain = philadelphiaCensusTracts@data$mean_exercise, bins = 8, na.color = "#cccccc")
interactive_map <- leaflet(philadelphiaCensusTracts) %>%
addPolygons(
data = philadelphiaCensusTracts,
fillColor = ~exercise_palette(philadelphiaCensusTracts@data$mean_exercise),
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "white", # border color
fillOpacity = 1) %>%
suspendScroll()
interactive_map
Now, let’s add some labels. We’ll do variable interpolation to create labels that tell what each Census tract is and the exercise and shooting data for that tract:
labels <- sprintf(
"<strong>%s</strong><br/>
Exercise in Minutes: %g <br/>
Number of Shootings: %g",
philadelphiaCensusTracts@data$NAMELSAD10,
philadelphiaCensusTracts@data$mean_exercise,
philadelphiaCensusTracts@data$num_shootings
) %>% lapply(htmltools::HTML)
Then we’ll create the map again, but with labels. This allows the viewer to see at a glance the violence and exercise metrics for each tract!
interactive_map <- leaflet(philadelphiaCensusTracts) %>%
setView(lng = mean(as.numeric(shootings$lng), na.rm=TRUE),
lat = mean(as.numeric(shootings$lat), na.rm=TRUE), zoom = 11) %>%
addPolygons(
data = philadelphiaCensusTracts,
fillColor = ~exercise_palette(philadelphiaCensusTracts@data$mean_exercise),
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "white", # border color
fillOpacity = 1,
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
suspendScroll()
interactive_map
So, we have a couple of polygon maps and a point map – how can we combine them all in Leaflet? We’re going to use a layer controller.
my_map <- leaflet() %>%
setView(lng = mean(philadelphiaCensusTracts@bbox['x',], na.rm=TRUE),
lat = mean(philadelphiaCensusTracts@bbox['y',], na.rm=TRUE), zoom = 11) %>%
addPolygons(data = philadelphiaCensusTracts,
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "grey", # border color
fillColor = ~shooting_palette(philadelphiaCensusTracts@data$num_shootings),
fillOpacity = 1,
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"),
group = "Frequency of Shootings With Injury"
) %>%
addPolygons(
data = philadelphiaCensusTracts,
weight = 1, # border thickness
opacity = 0.5, # border opacity
color = "grey", # border color
fillColor = ~exercise_palette(philadelphiaCensusTracts@data$mean_exercise),
fillOpacity = 1,
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"),
group = "Mean Exercise in Minutes"
) %>%
addCircles(lat = shootings$lat, lng = shootings$lng, color = "dark grey", weight = 1, radius = 100, group = "Frequency of Shootings With Injury") %>%
addCircles(lat = shootings$lat, lng = shootings$lng, color = "dark grey", weight = 1, radius = 100, group = "Mean Exercise in Minutes") %>%
addLayersControl(
baseGroups = c("Frequency of Shootings With Injury", "Mean Exercise in Minutes"),
options = layersControlOptions(collapsed = FALSE)
) %>%
suspendScroll()
my_map
```