Libraries needed for this section are:
ggmapreadrsprgdalhttrjsonliteData needed:
If you haven’t already, create a directory R_Workshop on your Desktop. Then set R_Workshop as your working directory in R Studio (Session > Set Working Directory > Choose Directory..), and download the files above.
There are a number of ways, for example:
What have we learned?
We can use an API to access a service (or tools) provided by someone else without knowing the details of how this service or tool is implemented.
A geocoding API provides a direct way to access these services via an HTTP request (simply speaking: a URL).
A geocoding service API request must be in a particular form as specified by the service provier.
Geocoding services responses are returned in a structured format, which is typically XML or JSON, sometimes also KMZ.
Our goal is now to do what we did in a web browser from R. For this we have to take into account also:
There are many geocoding providers2. They vary in terms of format specifications, access and use(!) resrictions, quality of results, and more. So choosing a geocoder for a research project depends on the specifics of your situation.
Load address data. Clean them up if necessary.
Send each address to the geocoding service (typically: create a URL request for each address)
Process results. Extract lat lon and any other values you are interested in and turn into into a convenient format, usually a table.
Save the output.
We will start by using the geocode command from the ggmap library.
ggmap library.geocode command, how would you search for location of the city of Santiago.output= option of the command)380 New York St, Redlands, CANow let’s write an R script to process an entire list of adresses for geocoding this way. We will use the generic script above and implement it in R:
readr library, which allows us to read in the csv into a data frame without downloading to the desktop, like so:banks <- read_csv(url("https://www.dropbox.com/s/z0el6vfg1vtmxw5/PhillyBanks_sm.csv?dl=1")) # we need the `readr` library for this!
geocode can take a vector of adresses. So all we have to do is find out where the addresses are in our banks data frame and then submit them to the function.banksCoords <- geocode([PUT THE ADDRESS VECTOR HERE])
geocode function. We only need to bind the lat/lon coordinates back to our original dataframe. We use the cbind function for this, like:banksCoords <- data.frame(cbind(banks, banksCoords))
write.table, for example. If we wanted to save it as a shapefile, we’d need to convert the dataframe to a spatial object first as we did in an earlier session, and then save with writeOGR.Thanks to our fabulous geospatial manager Stace Maples who is tirelessly working to make our GIS lives easier we have our own geolocator at Stanford at
The services available here cover the US only. The good news here are that there are no limits as of how many addresses you can throw at this server. However, you should let Stace know if you are intending to run a major job!
To use this service :
You need to get a token from here http://locator.stanford.edu/arcgis/tokens/
Username: add WIN\ before your SunetID, for example: WIN\cengel
Client: RequestIP
HTTP referer: [leave blank]
IP: [leave blank]
Expiration: (you decide)
Format: HTML
(The token is tied to the IP address of the machine that requests the service, so if you use a laptop and move, say from your home wireless over VPN to your lab on Campus, the same token will not work.)
Now let’s put together a URL that will determine the the location for 380 New York St, Redlands, CA.3
Here is what we need:
The request URL http://locator.stanford.edu/arcgis/rest/services/geocode/Composite_NorthAmerica/GeocodeServer/geocodeAddresses
The request parameters, required are addresses=, token=, and format= (for output).
ArcGIS requires also the input addresses also to be in JSON format, which means they need to look like this:
addresses=
{
"records": [
{
"attributes": {
"OBJECTID": 1,
"SingleLine": "380 New York St., Redlands, CA, 92373"
}
}
]
}
We attach all the request parameters to the request URL after a ?
That makes for this very convoluted URL:
http://locator.stanford.edu/arcgis/rest/services/geocode/Composite_NorthAmerica/GeocodeServer/geocodeAddresses?addresses={"records":[{"attributes":{"OBJECTID":1,"SingleLine":"380 New York St., Redlands, CA"}}]}&token=<YOUR TOKEN>&f=pjson
What a mess.
ArcGIS takes addresses in Single and Mutiline mode. The addresses in your table can be stored in a single field (as used above) or in multiple fields, one for each address component (Street, City, etc). Batch geocoding performance is better when the address parts are stored in separate fields (multiline). However, if there is an error in your batch, all the addresses in that batch that already have been geocoded will be dropped.
Now let’s run the same adresses from above with the ArcGIS geocoder.
Here, again are our steps.
Load address data.
Like above. Check.
Send each address to the geocoding service.
For our we don’t have a convenient function to do this, so we have to write our own.
Process results.
We will do this in the same function. Here it is:
## begin geocode function takes token and address as single line one at a
## time (SingleLine API) needs more work for errors: e.g. what if no results
## are returned? etc etc
geocodeSL <- function(address, token) {
# load the libraries
require(httr)
require(jsonlite)
# the server URL
gserver <- "http://locator.stanford.edu/arcgis/rest/services/geocode/Composite_NorthAmerica/GeocodeServer/"
# template for SingleLine format
pref <- "{'records':[{'attributes':{'OBJECTID':1,'SingleLine':'"
suff <- "'}}]}"
# make a valid URL
url <- URLencode(paste0(gserver, "geocodeAddresses?addresses=", pref, address,
suff, "&token=", token, "&f=json"))
# submit the request
rawdata <- GET(url)
# parse JSON to get the content
res <- content(rawdata, "parsed", "application/json")
# process the result
resdf <- with(res$locations[[1]], {
data.frame(lat = attributes$Y, lon = attributes$X, status = attributes$Status,
score = attributes$Score, side = attributes$Side, matchAdr = attributes$Match_addr)
})
# return as data frame
return(resdf)
}
## end geocode function
I have uploaded this function here, so to use it from within R, you can “source” it like this:
source("https://www.dropbox.com/s/k520ukglnrhzyj3/geocodeSL.R?dl=1")
This geocoding function unfortunately is not as convenient as the one we used earlier. So we have to loop through our adresses ourselves and save the result to a data frame. Before that you should set myToken to the value of your token and make sure that you have the httr and jsonlite librareis installed.
Once thats taken care of, we can do:
banksCoords <- do.call("rbind", sapply(banks$Address, function(x) geocodeSL(x,
myToken), simplify = FALSE))
geocodeSL try to geocode the same adress table as above.The open Data Science Toolkit (DSK) is available as a self-contained Vagrant VM or EC2 AMI that you can deploy yourself. It includes a Google-style geocoder which emulates Google’s geocoding API. This API uses data from the US Census and OpenStreetMap, along with code from GeoIQ and Schuyler Erle’s Modular Street Address Geocoder.
Insructions for how to run DSK on Amazon or Vagrant are here: http://www.datasciencetoolkit.org/developerdocs#amazon
Note that geocode from ggmap also has the option to access DSK, but it will use their public server, which is often slow or unavailable.
If you are interested to do this in R see here: https://github.com/cengel/r_IPgeocode