GIS Beyond ArcMap

Ben Bellman
December 5, 2017

Let's start with a discussion

Talk with one or two neighbors, jot down ideas:

  • How do you interact with GIS in daily life outside of class?
  • How do you see organizations use GIS for research and communication?
  • How might you use GIS in your future work or career?
  • Does this happen on ArcGIS or with other platforms?

Why use ArcGIS?

  • Huge range of products, tools, capabilities, specializations
  • Truly an all-in-one software
  • Plenty of technical and custormer support
  • But are there downsides?

Why not use ArcGIS?

  • It's incredibly expensive!!!
    • $800 per year for single “Basic” license
    • $3,000 per year for single “Standard” license
  • Only works on Windows
  • Its code is proprietary
  • Often unstable and inefficient
  • Can't cover absolutely everything

Proprietary vs. Open Source/Freeware

Legal definition of ownership and usage rights

  • Proprietary software is “restricted,” someone owns copyright to code
  • Must agree to terms and conditions to use, often must pay
  • Code is private property
    • Illegal to use or change without permission
    • “Black box”
    • Cannot sell, distribute, or share without permission

Proprietary vs. Open Source/Freeware

  • Open source software is in the public domain
    • No restrictions on usage, changes, distribution
    • Sometimes sold, but often costs nothing to use
    • Copyright still held, but license is totally open
  • Closely related to the “free” software movement

Famous Open Source Software

  • Ubuntu (Linux)
  • Mozilla Firefox (Netscape)
  • Apache HTTP Server
  • Audacity
  • VLC
  • R
  • Python
  • And much, much more

Ethos of Open Software

  • Open source software thrives on community collaboration
  • Relies on network of volunteer contributors
    • (unpaid labor)
  • Copyright owners are typically non-profit organizations
  • No warranties!!!!
  • Set up has real drawbacks for companies and agencies

Open Source GIS

  • OpenStreetMap
    • Essentially open source Google Maps
    • Community-sourced street data for whole planet
    • Open application programming interfaces (APIs) for submitting and using data

Open Source GIS

  • QGIS
    • Essentially open source ArcGIS
    • Contains similar suite of GIS tools developed by team of volunteers
    • Provides Python library for development of new plug-ins
  • R and Python can be used for geoprocessing, spatial analysis, etc.
    • Range of libraries for these purposes
    • PySAL developed by team behind GeoDa (also open source!)
    • R is especially robust for GIS and spatial statistics

GIS in the Command Line

  • ArcGIS and QGIS work by inputing commands for the computer to execute
    • We can enter these commands by clicking options, or directly in command line
    • There is actually a Python library developed by Esri that can access full functionality of ArcGIS software
  • Scripting your own code offers big advantages:
    • Automation of repetitive, complex tasks
    • Perfect replication of tasks
    • Easy to share and learn from other's code
    • Scientific transparency

ModelBuilder in ArcGIS

  • You can explore this yourself in ArcMap!
  • ModelBuilder allows you to build routines with standard point-and-click tools
    • Workflow is visualized as a flowchart as you build
    • Model then automates that workflow across a number of files
    • Can add multiple inputs and outputs
    • Export model as code written with ArcPy library!

What Can R Do For You?

  • R is a computing environment for statistics
    • Specifically designed as a “functional” programming language
    • Emphasizes writing and sharing new functions
  • R is defined by its community of developers and cutting edge techniques
    • Tens of thousands of packages of function on CRAN
    • Official repository of packages meeting certain standard
    • Lots of unoffical packages on repositories like Github
    • Homebrewed functions all over the web

What Can R Do For You?

  • R has fast become a standard for statistical computing
    • Lots of well known data science applications
    • Very marketable skill!
  • Lots of third-party support
    • RStudio is a must have GUI
    • Many companies and websites maintain R packages

Learning and Using R or Python

  • Programming is much less daunting than it seems!
  • Consider a computer science course
    • Python is an excellent multi-purpose language to know
  • Consider a statistics course
    • Look for one that teaches with R
  • Lots of online resources and courses, here are some links
    • Free introductions to data analysis in R and Python at DataCamp with great interactive exercises
    • Check out my RPubs page to see slides from an introductory workshop on R, contains plenty of example code

Let's look at R now!

  • One use is to make quick maps
  • First step is always to load the needed libraries
library(tidyverse)
library(readxl)
library(rgdal)
library(ggmap)

Mapping NYC Stop-and-Frisk Data

  • Available from NYC.gov; I use 2014 file from NYCLU
nyc_sqf <- read_xlsx("~/Google Drive/Computer Backup/R Workshop/Data/2014_sqf_web.xlsx")

select(nyc_sqf, xcoord, ycoord) %>%
  summary()
     xcoord            ycoord      
 Min.   : 914151   Min.   :121420  
 1st Qu.: 990886   1st Qu.:176014  
 Median :1005210   Median :192836  
 Mean   :1005444   Mean   :198626  
 3rd Qu.:1023191   3rd Qu.:221223  
 Max.   :1066574   Max.   :271882  
 NA's   :1650      NA's   :1650    

Mapping NYC Stop-and-Frisk Data

  • The coordinates in the data are from a UTM projection
  • We want latitude/longitude for mapping
lon_lat <- nyc_sqf %>% 
  select(xcoord, ycoord) %>%
  as.matrix() %>%
  project(proj = "+proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs", inv = TRUE)

nyc_sqf$lon <- lon_lat[,1]
nyc_sqf$lat <- lon_lat[,2]

Mapping NYC Stop-and-Frisk Data

select(nyc_sqf, lon, lat) %>%
  summary()
      lon              lat       
 Min.   :-74.25   Min.   :40.50  
 1st Qu.:-73.98   1st Qu.:40.65  
 Median :-73.92   Median :40.70  
 Mean   :-73.92   Mean   :40.71  
 3rd Qu.:-73.86   3rd Qu.:40.77  
 Max.   :-73.70   Max.   :40.91  
 NA's   :1650     NA's   :1650   

Mapping NYC Stop-and-Frisk Data

  • ggmap is an extension of ggplot2 library for graphics
  • Relies on static web map tiles
  • Data still just in a table, not yet spatial
get_map(location = "new york", 
        maptype = "satellite",
        zoom = 11) %>%
  ggmap() +
  coord_cartesian() +
  geom_point(data = nyc_sqf, aes(lon, lat),
             alpha = 0.05, color = "#ce1c1c", na.rm = TRUE)

Mapping NYC Stop-and-Frisk Data

plot of chunk map1_1

Detailed Maps with tmap

  • Can combine data sources for very effective maps
  • Package mixes points, lines, polygons, rasters
  • All with different symbologies
  • Still nowhere near as powerful as ArcMap in customization

Detailed Maps with tmap

library(tmap)

data(Europe, land, rivers, metro)

tm_shape(land) + 
    tm_raster("trees", breaks=seq(0, 100, by=20), legend.show = FALSE) +
tm_shape(Europe, is.master = TRUE) +
    tm_borders() +
tm_shape(rivers) +
    tm_lines(lwd="strokelwd", scale=5, legend.lwd.show = FALSE) +
tm_shape(metro) +
    tm_bubbles("pop2010", "red", border.col = "black", border.lwd=1, 
        size.lim = c(0, 11e6), sizes.legend = c(1e6, 2e6, 4e6, 6e6, 10e6), 
        title.size="Metropolitan Population") +
    tm_text("name", size="pop2010", scale=1, root=4, size.lowerbound = .6, 
        bg.color="white", bg.alpha = .75, 
        auto.placement = 1, legend.size.show = FALSE) + 
tm_format_Europe() +
tm_style_natural()

Detailed Maps with tmap

plot of chunk map3_1

Working with Points

  • Can create points file similar to “Display XY Data” in ArcMap
library(sp)

nyc_sqf <- filter(nyc_sqf, is.na(lon) == F)
coordinates(nyc_sqf) <- select(nyc_sqf, lon, lat)

class(nyc_sqf)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
slotNames(nyc_sqf)
[1] "data"        "coords.nrs"  "coords"      "bbox"        "proj4string"

Working with Points

plot(nyc_sqf)

plot of chunk unnamed-chunk-6

Ripley's K Function

  • Let's take a random day and see if there's clustering in stop-and-frisk locations
  • spatstat has many tools to analyze point patterns
library(spatstat)
library(maptools)

sample <- nyc_sqf[nyc_sqf$datestop==1092014,] %>% as.ppp()
window <- convexhull(sample) %>% as.owin()

sample[window] %>%
  envelope(fun = Kest) %>%
  plot(main = "Clustering in one day of NYPD stops")

Ripley's K Function

Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.

Done.

plot of chunk ripley1

Kernel Density Surface

sample[window] %>%
  density() %>%
  plot(main = "Heat map of stops for one day")

plot of chunk unnamed-chunk-7

Working with Polygons

sa <- readOGR("/Users/benjaminbellman/Dropbox (Brown)/Police/Data/Final Files", "SanAntonio_1015")
OGR data source with driver: ESRI Shapefile 
Source: "/Users/benjaminbellman/Dropbox (Brown)/Police/Data/Final Files", layer: "SanAntonio_1015"
with 366 features
It has 81 fields
Integer64 fields read as strings:  REGIONA DIVISIONA STATEA COUNTYA COUSUBA PLACEA TRACTA CONCITA AIANHHA RES_ONLYA TRUSTA AITSCEA ANRCA CBSAA CSAA METDIVA NECTAA CNECTAA NECTADIVA UAA CDCURRA SLDUA SLDLA ZCTA5A SUBMCDA SDELMA SDSECA SDUNIA PUMA5A BTTRA BLKGRPA BTBGA medhhinc totpop white black asian hisp nonwhite Rowid_ ZONE_CODE COUNT 

Working with Polygons

qtm(sa, fill="divseg", fill.title="")

plot of chunk unnamed-chunk-9

Making a LISA Map

library(spdep)
library(classInt)

weights <- poly2nb(sa) %>% nb2listw()
locm <- localmoran(sa$gini, weights)
sa$Sgini <- scale(sa$gini)
sa$lag <- lag.listw(weights, sa$Sgini)
sa$pval <- locm[, 5]

Making a LISA Map

sa$quad_sig <- ifelse(sa$Sgini >= 0 & sa$lag >= 0 & sa$pval <= 0.05, 1,
                      ifelse(sa$Sgini <= 0 & sa$lag <= 0 & sa$pval <= 0.05, 2,
                      ifelse(sa$Sgini >= 0 & sa$lag <= 0 & sa$pval <= 0.05, 3,
                      ifelse(sa$Sgini >= 0 & sa$lag <= 0 & sa$pval <= 0.05, 4, 5))))

breaks <- seq(1, 5, 1)
labels <- c("High-High", "Low-Low", "High-Low", "Low-High", "Not Signif.")
np <- findInterval(sa$quad_sig, breaks)
colors <- c("red", "blue", "lightpink", "skyblue2", "white")

Making a LISA Map

plot(sa, col = colors[np])
mtext("Local Moran's I\nof Gini Index", cex = 1.5, side = 3, line = 1)
legend("topright", legend = labels, fill = colors, bty = "n")

plot of chunk unnamed-chunk-12