Using mappers.r

Mappers.r is a package of functions designed to easily access, plot, and analyze spatial data in New Orleans. The file can be accessed here. While some of the functions can be used in other areas, the package was designed with New Orleans in mind (though most of the functions could be adopted to use in other areas). This package is not designed as a GIS tool, and many advanced GIS processes are well beyond R's capabilities. However, it is designed to allow non-GIS experts to easily perfom basic (though still powerful) spatial tasks. To help newcomers use the package, this document will give a brief run-through of spatial the functions in the package.

One technical note: throughout the document, we will refer to spatial objects and capitalized Spatial objects. The former indicates the literal meaning of the word: basically anything that can be mapped. The latter refers to classes of R objects within the sp package. For more information on the structure of this object, see this guide to the package.

Accessing Spatial Data

R's ability to read and process spatial data is often clunky at best. These functions are designed to make them a little less so. The types of data that this package works with are .csv files with fields to identify geography (which R sees as regular data frames and need to be converted to Spatial objects) and shapefiles (which R already recognizes as Spatial objects).

Shapefiles

Because shapefiles are already spatial in nature, they are the easiest to work with. If we have a shapefile on a local drive, it can be read into R with the function readOGR() in the rgdal package, which reads in shapefiles as Spatial objects. Often, you will be dealing with a shapefile on the web. To get this, it can be downloaded, unzipped, and read in with readOGR(). Or, we can just use zipToSpatial(), which does all of that for us.

nola.url <- "http://data.nola.gov/api/geospatial/2b2j-u6kh?method=export&format=Shapefile"
nola <- zipToSpatial(nola.url)
## OGR data source with driver: ESRI Shapefile 
## Source: "JunkDataForZips", layer: "NOLA_Boundary"
## with 2 features
## It has 9 fields
plot(nola)

plot of chunk unnamed-chunk-2

This function is made particularly useful by several wrapper functions that read in commonly-used New Orleans geographies. These include getNOLA() for the parish boundary, getTracts() for New Orleans census tracts, and more. The wrapper functions save a cached version of the object so that we don't have to read in the data every time we call the function, which is helpful because reading shapefiles can be very slow for larger objects.

Tables with Coordinates

Unfortunately, we often aren't lucky enough to have our data in shapefiles. With point data, the next best case is a data frame with coordinates - columns that indicate either longitude and latitude or X and Y positions. R can convert a data frame with coordinates into a Spatial object using the function SpatialPointsDataFrame() from the sp package. Mappers has a wrapper for this function, toSpatialPoints(), which takes a data frame and the column names of the coordinates.

Before running the function, an important technical note and disclaimer: there are two types of coordinate reference systems (CRS), the reference frame for a set of coordinates. Geographic systems are based on degrees latitude and longitude for every point on the globe, and are useful for defining exact positions. Projected systems attempt to project the spherical earth onto a flat plane. Their units are a horizontal and vertical distance from a point on that plane, often in feet or meters, which makes them useful for finding the distance between objects. Generally its pretty easy to tell if a coordinate system is geographic or projected, since the units of measurement are so different. However, it is hard to tell which specific CRS is being used (and there are a huge number of types).

This function is designed to work mostly with two types of CRS, though the function also accepts optional user-entered CRS values with the p4string argument. If a CRS is not explicitly defined, the function will determine if the coordinate system is geographic or projected. If the former, it assumes the points are in the WGS 84 system. Otherwise, it assumes they are in the NAD 83 Louisiana South system. These assumptions are completely unrealistic on a global, national, or even state level. But within New Orleans, we often work with data in one of these two systems (though even then it can't be guaranteed). If you are using data outside of the region, the function should be adapted for the context (or be sure to always enter the CRS).

Below, we pull down data on demolitions in New Orleans from the web (using another function in the package, csvFromWeb()) and attempt to make it into a Spatial object (after some regular expressions to extract the latitude and longitude as separate columns). The data can be found here.

demos <- csvFromWeb("https://data.nola.gov/api/views/e3wd-h7q2/rows.csv?accessType=DOWNLOAD")
## Warning in download.file(file.source, file.loc): downloaded length 474527
## != reported length 474527
# latitude and longitude should be in separate columns
demos$lat <- as.numeric(sub(".*?\\((.*?),.*", "\\1", demos$Location.1))
demos$lon <- as.numeric(sub(".*?,(.*?)).*", "\\1", demos$Location.1))
head(demos)
##   ObjectID Program CaseID             Full_Address       Match_Address
## 1       NA     IDC        7770 NORTH CORONET COURT   7770 N CORONET CT
## 2       NA    FEMA                 1842 LESSEPS ST     1842 LESSEPS ST
## 3       NA     IDC         2227 TERPSICHORE STREET 2227 TERPSICHORE ST
## 4       NA    FEMA               7858 N CORONET CT   7858 N CORONET CT
## 5       NA    FEMA                2211 FLORIDA AVE    2211 FLORIDA AVE
## 6       NA    FEMA             3117 N DORGENOIS ST 3117 N DORGENOIS ST
##     GEOPIN Units Council.District       Demolition_Start
## 1 41099488     4                E 04/15/2016 12:00:00 AM
## 2 41067036     2                D 05/28/2015 12:00:00 AM
## 3 41031191     1                B 05/27/2015 12:00:00 AM
## 4 41099536     1                E 05/21/2015 12:00:00 AM
## 5 41078150     1                D 05/20/2015 12:00:00 AM
## 6 41064586     1                D 05/19/2015 12:00:00 AM
##      Demolition_Complete            Location.1      lat       lon
## 1 04/18/2015 12:00:00 AM (30.02623, -89.99997) 30.02623 -89.99997
## 2 05/29/2015 12:00:00 AM (29.97313, -90.03038) 29.97313 -90.03038
## 3 05/29/2015 12:00:00 AM (29.94279, -90.08366) 29.94279 -90.08366
## 4 05/22/2015 12:00:00 AM (30.02668, -89.99792) 30.02668 -89.99792
## 5 05/21/2015 12:00:00 AM (29.98616, -90.05643) 29.98616 -90.05643
## 6 05/20/2015 12:00:00 AM (29.98205, -90.03934) 29.98205 -90.03934
demos.sp <- toSpatialPoints(df = demos, X = "lon", Y = "lat")
## Error in validityMethod(as(object, superClass)): Geographical CRS given to non-conformant data: -90.09981342

As we see above, the function failed. This is because there are points in the data that don't fit within the CRS. To handle situations like this, we can set remove.outliers = TRUE. This option takes a while to run, so it is generally best to use it only when you have invalid coordinates that cause an error in the function. One common instance occurs when some rows have their X and Y coordinates switched, though if that is the case it may be best to just fix the coordinates. A faster (though not always better) option for data cleaning is to set nola.only = TRUE, which removes points outside of New Orleans (which are often the result of bad data). The function's default is to have nola.only = TRUE and remove.outliers = FALSE

demos.sp <- toSpatialPoints(df = demos, X = "lon", Y = "lat", remove.outliers = TRUE)
summary(demos.sp)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                 min       max
## coords.x1 -90.13516 -89.73795
## coords.x2  29.89900  30.16057
## Is projected: FALSE 
## proj4string :
## [+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0]
## Number of points: 3188
## Data attributes:
##     ObjectID             Program                   CaseID    
##  Min.   :     1   FEMA       : 305                    :1651  
##  1st Qu.:   668   IDC        : 622   DRCES10C002-00001:   1  
##  Median :  1444   NORA       :1752   DRCES10C002-00002:   1  
##  Mean   :  1538   NOSD       : 192   DRCES10C002-00003:   1  
##  3rd Qu.:  2228   SDER       :  27   DRCES10C002-00004:   1  
##  Max.   :107121   SDER (NOSD): 290   DRCES10C002-00005:   1  
##  NA's   :127                         (Other)          :1532  
##                 Full_Address               Match_Address 
##  2563 Industry Street :   2   1901 ST FERDINAND ST:   2  
##  1 CHRISTOPHER COURT  :   1   1919 MONROE ST      :   2  
##  1 CURTIS DR          :   1   2563 INDUSTRY ST    :   2  
##  10 MARYWOOD CT       :   1   3130 N TONTI ST     :   2  
##  10031 FLOSSMOOR DRIVE:   1   1 CHRISTOPHER CT    :   1  
##  10100 Morrison Road  :   1   1 CURTIS DR         :   1  
##  (Other)              :3181   (Other)             :3178  
##      GEOPIN             Units         Council.District
##  Min.   :       0   Min.   :  0.000    :   5          
##  1st Qu.:41066905   1st Qu.:  1.000   A: 296          
##  Median :41088414   Median :  1.000   B: 303          
##  Mean   :40932803   Mean   :  1.288   C: 260          
##  3rd Qu.:41112240   3rd Qu.:  1.000   D:1292          
##  Max.   :41204949   Max.   :195.000   E:1032          
##  NA's   :21                                           
##                Demolition_Start             Demolition_Complete
##  04/29/2011 12:00:00 AM: 147    04/29/2011 12:00:00 AM:  98    
##  04/15/2011 12:00:00 AM:  75    12/09/2010 12:00:00 AM:  47    
##  12/09/2010 12:00:00 AM:  66    04/15/2011 12:00:00 AM:  46    
##  06/27/2011 12:00:00 AM:  52    06/14/2012 12:00:00 AM:  40    
##  02/09/2011 12:00:00 AM:  41    12/22/2011 12:00:00 AM:  39    
##  08/31/2011 12:00:00 AM:  40                          :  37    
##  (Other)               :2767    (Other)               :2881    
##                              Location.1        lat             lon        
##  (29.95655561690, -90.12913298420):   2   Min.   :29.90   Min.   :-90.14  
##  (29.95659822590, -90.11155867320):   2   1st Qu.:29.97   1st Qu.:-90.07  
##  (29.97688704260, -90.04519328210):   2   Median :29.99   Median :-90.04  
##  (29.97971334180, -90.03978937650):   2   Mean   :29.99   Mean   :-90.04  
##  (29.89900081630, -89.99850464070):   1   3rd Qu.:30.02   3rd Qu.:-90.01  
##  (29.90134161400, -89.99442855200):   1   Max.   :30.16   Max.   :-89.74  
##  (Other)                          :3178

Other Tools to Get Spatial Data

Sometimes we won't even have access to coordinates to make a map, but there are still options to make data Spatial. Many City datasets include GeoPINs, a numeric ID tied to each parcel in New Orleans. Unlike coordinates, which have an inherent spatial meaning, GeoPINs are arbitrary numbers that don't actually describe where something is. But you can still convert your data into a Spatial object using the geopinsToPoints() function. This function reads in the parcel layer, and matches the GeoPINs in your data to the GeoPINs of the parcels. The resulting points are based on the center of the matching parcels.

demos <- demos[!duplicated(demos$GEOPIN), ]
demos.sp.geopin <- geopinsToPoints(df = demos, geopin.col = "GEOPIN")
plot(nola)
plot(demos.sp.geopin, add = TRUE)

plot of chunk unnamed-chunk-6

Note that in cases like the demolition data where we have both coordinates and GeoPINs, it is usually better to convert to a Spatial object based on coordinates, since its a more direct look at the data. The geopinsToPoints() function invloves merging data with a Spatial object, which can be a difficult process in R and requires data to be structured in a very specific manner (in the first line of the above chunk, we had to remove a duplicate GeoPIN, or the function would result in an error). Though if there is reason to believe that the GeoPINs are highly accurate but the coordinates are not, it may be best to use GeoPINs.

A final situation, not covered by this package but still important to consider, is when you just have addresses but nothing else. In this case, you can use a geolocator to find the location of those addresses. While it is possible to do this in R with the geocode() function from the ggmap package, which uses the Google Maps API, it is strongly recommended to use the City's geolocator within ArcGIS, as Google maps can be very imprecise, and also has a limit of 2,500 addresses per day

Mapping Data

Basic Mapping

Now that we've downloaded our data, we can map it. As shown above, mapping can be as simple as using the plot() function. This is useful for exploratory analysis and making sure that the data is accurate, though it is limited both practically and aesthetically. Mappers can look at data in three ways: polygons, points, and interactive leaflet maps. For each of these, we have a fairly high degree of control over how the data is symbolized.

Let's start with a simple example: plotting points in the same style.

mapOPAPoints(demos.sp, X = "lon", Y = "lat", fill = "blue", size = .75, title = "New Orleans Demolitions")

plot of chunk unnamed-chunk-7

Above is a map of all demolitions in New Orleans. Note that while we used a Spatial object, the function can also take a regular data frame, in which case it will automatically use toSpatialPoints().

What if we want to map multiple data sets? This is as simple as making multiple calls to the mapping function, and using the old.map argument. For example, below we map both demolitions and Alcoholic Beverage Outlets (ABOs). Note that we map the ABOs without explicitly converting to a Spatial object.

demo.map <- mapOPAPoints(demos.sp, X = "lon", Y = "lat", fill = "blue", size = .75)

ABOs <- csvFromWeb("https://data.nola.gov/api/views/uiry-as9x/rows.csv?accessType=DOWNLOAD")
ABOs$lat <- as.numeric(sub(".*?\\((.*?),.*", "\\1", ABOs$Location.1))
ABOs$lon <- as.numeric(sub(".*?,(.*?)).*", "\\1", ABOs$Location.1))

mapOPAPoints(ABOs, X = "lon", Y = "lat", fill = "red", size = 1.5, 
  title = "ABOs (red) and \n Demolitions (blue)", old.map = demo.map)

plot of chunk unnamed-chunk-8

Categorical Styling

The City has a number of different demolition programs. If you want to see which programs are active in which areas, you can use the style argument. This takes the name of a column in our data and colors the points based on the entries of that column, as below:

demos.sp$Program <- gsub("SDER \\(NOSD)", "NOSD", demos.sp$Program)
demos.sp$Program <- gsub("SDER", "NOSD", demos.sp$Program)
fill <- c("darkorchid", "darkorange2", "dodgerblue", "chartreuse3")

mapOPAPoints(pts = demos.sp, X = "lon", Y = "lat", style = "Program", size = 2, fill = fill, 
  title = "New Orleans Demolitions \n by Program")

plot of chunk unnamed-chunk-9

Numeric Styling

While these maps are useful for showing where demolitions are happening, its hard to get a sense of the exact magnitudes, since there are so many points. They aren't very useful for answering questions like “Have there been more demolitions in St. Roch or Gentilly?”, or “Which part of New Orleans East has the most demolitions?”. To answer questions like these, we may want to find the total number of demolitions within a certain area. This is done below:

tracts <- getTracts()
tracts$demo.count <- countWithin(within = demos.sp, around = tracts)

mapOPAPoly(geom = tracts, style = "demo.count", fill = rev(heat.colors(2)), 
  title = "Total Number of Demolitions \n per Census Tract")

plot of chunk unnamed-chunk-10

Here, our style column is a numeric rather than a character or factor, resulting in a choropleth map where the coloration is done on a scale from the lowest to highest values. As is often the case with continuous scaling, a lot of details are lost in this map. A small number of tracts have had so many demolitions that they drown out the rest. We can improve our styling with the breaks argument, which allows us to group data into specified clusters. This argument can either be a numeric vector the same length of fill, for manually defined breaks, or a character string with the same options as the classIntervals() function from the classInt package. In this case, the number of classes is defined by the length of fill. Let's see an example of each:

mapOPAPoly(geom = tracts, style = "demo.count", fill = rev(heat.colors(4)), breaks = "quantile", 
    labs = c("Lowest Quartile", "Second Quartile", "Third Quartile", "Top Quartile"), 
    title = "Count of Demolitions \n by Census Tract", piping = TRUE) +
       guides(fill = guide_legend(nrow = 2, byrow = TRUE))

plot of chunk unnamed-chunk-11

mapOPAPoly(geom = tracts, style = "demo.count", fill = rev(heat.colors(5)), breaks = c(-1, 0, 5, 20, 50), 
    title = "Count of Demolitions \n by Census Tract") +
      guides(fill = guide_legend(nrow = 3, byrow = TRUE))

plot of chunk unnamed-chunk-11

Note the use of guides(), a function from the ggplot package, to change the number of rows in our legend. We can do this because these functions draw heavily from the ggplot package. Virtually anything that can be done in ggplot can also be done by mapOPAPoints() and mapOPAPoly().

We also used the above nraphs to show some additional features of the mapping functions. First, note the black border around each tract in the top map. This is done with the option piping = TRUE. Also note that there is no labs argument in the bottom function. If no labels are given, mappers will attempt to make them with the function prettyLabs() (though as the map shows, it doesn't always work perfectly).

Mapping Polygons From Data Frames

mapOPAPoly also includes a useful feature that allows us to merge a data frame with a commonly used set of polygons such as parcels, Census geographies, and Council Districts (any geography with a get... function). For example, below we map the parcels in which demolitions have taken place.

mapOPAPoly(geom = "parcels", poly.dat = demos, id.var = "GEOPIN", title = "Demolished Parcels", alpha = 1)

plot of chunk unnamed-chunk-12

This feature is particularly useful when mapping data from the Census Bureau. Rather than doing a merge for each group of data, the function does it automatically. However, its important to remember that merging with spatial data can be tricky and the merge used by mapOPAPoly() in particular isn't very flexible, so if your data has too many quirks, it's best to do the merge outside of the function and then map the resulting Spatial object.

Interactive Maps

What if we want to explore the data more closely? We want to zoom in to specific areas, and get more details about each demolition like the number of units demolished and the date of demolition. These tasks can be done with interactive leaflet maps, which you can make in R with the leaflet package, which is used in this package with the function mapOPALeaflet(). Below is a function to make a leaflet map (run the function to pull up the map. Warning: takes a while to run).

#demos.sp.sub <- demos.sp[1:500, ]
#  mapOPALeaflet(spatial = demos.sp.sub, title = "New-Orleans-Demolitions", 
#  fields = c("Full_Address", "Program", "Demolition_Start"), 
#  style = "Program", fill = fill, fill.alpha = .9)

We can map multiple layers on a leaflet map using mapOPAMultiLeaflet(), which works similarly to the single-layer version, with a couple of key differences. First, the spatial objects and fields should be put into lists, while all aesthetic arguments should be put in vectors of the same length as the lists. Also, the LeafletR package does not support styling options when using multiple layers, so all features from each layer will appear the same.

ABOs <- csvFromWeb("https://data.nola.gov/api/views/uiry-as9x/rows.csv?accessType=DOWNLOAD")
ABOs$lat <- as.numeric(sub(".*?\\((.*?),.*", "\\1", ABOs$Location.1))
ABOs$lon <- as.numeric(sub(".*?,(.*?)).*", "\\1", ABOs$Location.1))
ABOs.sp <- toSpatialPoints(ABOs, X = "lon", Y = "lat")
#mapOPAMultiLeaflet(spatial.list = list(demos.sp.sub, ABOs.sp), 
#  fields = list(c("Full_Address", "Program", "Demolition_Start"), c("Trade.Name", "Address", "Description")), 
#  title = "ABOs-Demos", spatial.names = c("Demolitions", "Alcoholic-Beverage-Outlets"), 
#  fill = c("black", "#F47940"), color = c("#F47940", "black"), lwd = c(1, 1.5), fill.alpha = c(.75, .9), 
#  geojson.exists = TRUE)

As noted above, Leaflet maps can take a very long time to make. Most of that time is spent writing the .geojson file that gets fed to Leaflet. This is particularly tedious if you've run the function already and just want to make some aesthetic changes. In these cases, you can use the argument geojson.exists = TRUE, which uses the previously written file and just makes your changes. However, be sure not to use the argument if you're changing any of the fields arguments, or if you move to a different working directory, as it will lead to an error.

Another important note on these maps: leaflet doesn't accept certain special characters such as slashes in the data. The function performs data cleaning to get rid of most special characters, but if a dataset has characters that aren't removed, they should be taken out before using the function (and/or the function should be adapted to remove them). This also extendes to the title argument, which does not accept special characters and also doesn't accept spaces (though the data itself can have spaces).

Before going to the final section, it's worth noting that these examples should give a good idea of the mapping utility of mappers, but they are not comprehensive. The mapping functions have lots of different features, both aesthetic and functional, as well as room to be extended. For full details, read the description of each function.

Analyzing Data

Mappers does not have very sophistocated functions for analyzing data, but the tools it does have can go a long way. Two of the most important questions of spatial analysis are how many points are in a region and, conversely, what region is each point in. To answer the first, we have countWithin(), and for the second we have inWhich(). Both are wrappers for common uses of the over() function in the sp package, which works similarly to a Spatial Join in ArcGIS. This function has a number of uses that aren't implemented by this package. For other ways to use it, see help(over).

An example of countWithin() is above, but to go deeper, the function takes a set of points (within) and a set of polygons (around), and counts the number of points inside of each polygon. The function can also be used to find how many points are within a set radius of another set of points, as below where we find the number of demolitions within 1,000 feet of each ABO in the city (the units of range will be the same as the units of your CRS. In this case our CRS is the NAD 83 Louisiana South system, with units in feet).

ABOs.sp$demo.counts <- countWithin(within = demos.sp, around = ABOs.sp, range = 1000)

mapOPAPoints(pts = ABOs.sp, X = "lon", Y = "lat", 
  style = "demo.counts", fill = rev(heat.colors(2)), 
    size = 2.5, title = "Demolitions Near ABOs")

plot of chunk unnamed-chunk-15

Finally, we can also do the reverse and find where something is, using inWhich(). Below, we find which NOFD district the demolition at 1842 Lesseps is in.

NOFD.dists <- zipToSpatial("https://data.nola.gov/api/geospatial/k4eq-x4cs?method=export&format=Shapefile")
## OGR data source with driver: ESRI Shapefile 
## Source: "JunkDataForZips", layer: "NOFD_Districts"
## with 6 features
## It has 4 fields
demos.add <- subset(demos.sp, Full_Address == "1842 LESSEPS ST")
find.dist <- inWhich(around = NOFD.dists, within = demos.add, char = "NAME")
paste0("1842 Lesseps is in the ", find.dist, " NOFD district")
## [1] "1842 Lesseps is in the 3rd NOFD district"