With over 18,000 packages documented in CRAN, it hard to find a
reason to use another piece of software other than R to complete
statistical analyses, including spatial ones. It is true that most
spatial analyses one may desire to use can be done solely in R with
packages such as sp, sf, raster,
rasterVis, gstat, maptools,
spatstat, and many others, but there may be some analyses
or tools that a user may be more familiar with in the ArcGIS setting, or
that they simply cannot easily find within an R package already. For
those that are having a hard time giving up their proprietary software
but only want to work within one program, I will demonstrate how to take
advantage of the R-Python-ArcGIS bridge. This document will provide
instructions for working with the bridge, which is necessary to perform
the functions of toolboxes in ArcGIS Pro within R by integrating R
scripts with Python.
Although there is an R package that creates a bridge with ArcGIS
already (arcgisbinding), this package does not provide
access to toolboxes, and was created for those that want to transfer
data from ArcGIS to R, utilize the statistical analysis functionalities
of R, then transfer the data back to ArcGIS Pro for mapping, publishing,
and sharing. You can learn more about arcgisbinding here.
The bridge we will be working with is the opposite of that described by
the arcgisbinding package. To crudely describe the process,
we will be starting with data in R, “sending” it to ArcGIS to analyze,
then transferring it back to R for further analyses, mapping, and
reporting.
As an example for taking advantage of the R-Python-ArcGIS bridge, we will be creating hot spot maps with the “Optimized Hot Spot Analysis” tool documented in ArcGIS Pro. The tool identifies statistically significant spatial clusters of high values (hot spots) and low values (cold spots). It automatically aggregates incident data, identifies an appropriate scale of analysis, and corrects for both multiple testing and spatial dependence. In short, the tool determines settings that will yield optimal results for you. To read more about the “Optimized Hot Spot Analysis” tool and how it works, visit the ArcGIS Pro website.
The bridge I describe in this document is fragile. Through trial and error, I have discovered that there are many opportunities for the tools to fail if conditions are not met perfectly. Therefore, I will be providing the exact versions of the software and packages I used for the bridge to work on my local machine but acknowledge that it may not work on another user’s machine. Because ArcGIS Pro is a proprietary software, you will need to purchase a license before installing it. For the sake of documentation, I am using an “Advanced” license in this example, but a “Basic” or “Standard” license should also work just fine.
To follow along exactly with the instructions provided, you must use the following softwares and versions:
R 4.2.3
RStudio 2023.12
ArcGIS Pro 3.2.1 (active license required)
You may note that I did not list Python as a software needed to run
the tools, although Python 3.9 is indeed used. ArcGIS Pro comes with
Python when it is installed and is the primary language for automation
in ArcGIS Pro, so it does not need to be installed separately. All of
the tools are run in Python, and we will be using ArcGIS Pro’s Python
interpreter inside of R to access the arcpy package, which
contains the methods for tools. It is of note that arcpy
cannot be used unless there is an active license and downloaded copy of
ArcGIS Pro on the machine running the scripts.
Please note that you must be familiar with R, Python, and ArcGIS Pro to follow along with complete understanding.
For the R project in which you plan on utilizing the bridge, I recommend creating a new project file (.aprx) in ArcGIS Pro that will be entirely dedicated to your R project. In this example, I will be calling mine “Working Bridge.aprx”. I do not recommend using an already established project file, as it will be harder to debug your script if any settings are accidentally changed that may affect how tools run. I would next recommend opening a new map in your ArcGIS Pro project for your own use in testing and debugging.
We will now need to gain access to the Python interpreter in order to
run the arcpy module. The interpreter is the python.exe
application inside of ArcGIS Pro. Instead of searching through your file
browser, open the Python window inside of the ArcGIS Pro application. If
you are unsure of where to locate this window, navigate to the
“Analysis” header. In the “Geoprocessing” section, click on the Python
button and select “Python Window” from the dropdown menu. In the Python
window that pops up, type:
sys.prefix
What is returned is a path where you can find the Python application. For example, the file that is returned to me is:
C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3
The application should be in the “arcgispro-py3” folder:
C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe
Copy this path, but your version of it.
Back in RStudio, we need to apply the interpreter. At the top of the window, select “Tools” then “Global Options…”. On the left-hand side, select “Python”. In the “Python interpreter” input box, paste the path to the Python application that you copied from the previous steps. You may need to remove the quotation marks that might be pasted along with the path. The interpreter I used at the time of writing is a Conda environment, Python version 3.9.18. Next, press “Apply”. Your R session will have to restart. You can check that this setting was saved by going back to the page where you entered the path and ensuring it is still in the box.
Complete any steps needed to pre-process your data before writing it to a .csv file.
I am keeping all of the non-ArcGIS Pro files used inside of my working directory for easy access.
For the example, I will be using spatial data included with with the
R package spData.
library(spData)
# Import the data set and take a look at the attributes
data <- spData::nc.sids
head(data)## CNTY.ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 east north x
## Ashe 1825 1091 1 10 1364 0 19 164 176 -81.67
## Alleghany 1827 487 0 10 542 3 12 183 182 -50.06
## Surry 1828 3188 5 208 3616 6 260 204 174 -16.14
## Currituck 1831 508 1 123 830 2 145 461 182 406.01
## Northampton 1832 1421 9 1066 1606 3 1197 385 176 281.10
## Hertford 1833 1452 7 954 1838 5 1237 411 176 323.77
## y lon lat L.id M.id
## Ashe 4052.29 -81.48594 36.43940 1 2
## Alleghany 4059.70 -81.14061 36.52443 1 2
## Surry 4043.76 -80.75312 36.40033 1 2
## Currituck 4035.10 -76.04892 36.45655 1 4
## Northampton 4029.75 -77.44057 36.38799 1 4
## Hertford 4028.10 -76.96474 36.38189 1 4
The nc.sids data set contains data on sudden infant
deaths in North Carolina. We will be using the “SID79” attribute in our
analysis. This attribute describes the number of deaths due to sudden
infant death syndrome from 1979-1984 in each county.
The next block of code uses the reticulate package and
the py_run_string() function to evaluate Python code.
I would like to note that there are other ways to run strings of Python within R, however this way has been working for me and allows me to see each line more clearly than some of the other ways of running Python in R. If you do not need to run large amounts of Python, this method works just fine.
The “grid” I am using to join the data to is a shapefile containing the boundaries to all of North Carolina’s counties. You can download the files from NC OneMap. Each point will be joined to each county based on their location. If there is more than one point in a certain county, the average of the Gi score will be taken in order to classify that county’s hot spot status. If there is no data reported for a county, it will not be included in the final shapefile of the hot spots.
Note that you can use any kind of neighboring polygon shapefile as your grid. For evenly spaced square or hexagon-shaped grids, I would recommend creating one in ArcGIS Pro using the “Create Fishnet” Geoprocessing tool.
library(reticulate)
py_run_string("import arcpy")
# Set environment settings (e.g., workspace, output coordinate system, allow file overwriting)
py_run_string(
"arcpy.env.workspace = 'C:/Users/Path/To/Your/Project/File/Working Bridge'
arcpy.env.outputCoordinateSystem = arcpy.SpatialReference(4326)
arcpy.env.overwriteOutput = True"
)
# Import data and convert table to .shp file
py_run_string(
"input_csv = 'C:/Users/Path/To/Your/Project/File/Working Bridge/data.csv'
output_feature_class = 'point_data'
arcpy.management.XYTableToPoint(input_csv, output_feature_class, 'lon', 'lat')"
)
# Define input path, output path, and analysis field for the hotspot tool
py_run_string(
"input_feature_class = 'point_data.shp'
output_hotspot_result = 'hotspot_point.shp'
analysis_field = 'SID79'"
)
# Run Optimized Hot Spot Analysis tool
py_run_string(
"arcpy.stats.OptimizedHotSpotAnalysis(
input_feature_class,
output_hotspot_result,
Analysis_Field = analysis_field,
Incident_Data_Aggregation_Method = 'COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS')"
)
# Add input/outputs of the grid and points for the spatial join tool
py_run_string(
"nc_grid_shp = 'C:/Users/Path/To/Your/Project/File/Working Bridge/North_Carolina_State_and_County_Boundary_Polygons.shp'
target_features = 'North_Carolina_State_and_County_Boundary_Polygons.shp'
join_features = 'hotspot_point.shp'
out_feature_class = 'joinedgrid.shp'"
)
# Create field mappings for the spatial join
py_run_string(
"fieldmappings = arcpy.FieldMappings()
fieldmappings.addTable(target_features)
fieldmappings.addTable(join_features)
GiBinFieldIndex = fieldmappings.findFieldMapIndex('Gi_Bin')
fieldmap = fieldmappings.getFieldMap(GiBinFieldIndex)
field = fieldmap.outputField
field.name = 'mean Gi Bin'
field.aliasName = 'mean Gi Bin'
fieldmap.outputField = field
fieldmap.mergeRule = 'mean'
fieldmappings.replaceFieldMap(GiBinFieldIndex, fieldmap)"
)
# Run the spatial join tool
py_run_string(
"arcpy.analysis.SpatialJoin(
target_features,
join_features,
out_feature_class,
field_mapping = fieldmappings,
match_option = 'COMPLETELY_CONTAINS')"
)
# Only keep cells that contain data
py_run_string(
"input_shapefile = 'joinedgrid.shp'
output_shapefile = 'selectedgrid.shp'
expression = 'Join_Count > 0'
arcpy.Select_analysis(input_shapefile, output_shapefile, expression)"
)In the results of the next code chunk, you can see that after running the hot spot analysis, ArcGIS Pro added some additional attributes, including the Gi z-score, Gi p-value, and Gi bin for each cell. The bin number is assigned to each cell based on the values of the z-score and p-value. For example, a cell with a positive z-score and a p-value less than or equal to 0.01 is assigned to bin 3, which we classify as a hot spot with 99% confidence. The z-score controls the “hot” or “cold” designation, while the the p-value denotes the significance.
library(dplyr)
library(sf)
# Load the hot spot shapefile made in ArcGIS Pro
hotspots_grid <- read_sf(dsn = 'C:/Users/Path/To/Your/Project/File/Working Bridge',
layer = "selectedgrid")
# Create custom classifications based on the binned Gi scores
hotspots_df <- hotspots_grid %>%
mutate(classification = case_when(
mean_Gi_Bi == 3 ~ "Hot Spot with 99% Confidence",
mean_Gi_Bi == 2 ~ "Hot Spot with 95% Confidence",
mean_Gi_Bi == 1 ~ "Hot Spot with 90% Confidence",
mean_Gi_Bi == -3 ~ "Cold Spot with 99% Confidence",
mean_Gi_Bi == -2 ~ "Cold Spot with 95% Confidence",
mean_Gi_Bi == -1 ~ "Cold Spot with 90% Confidence",
mean_Gi_Bi == 0 ~ "Not significant"
)) %>%
mutate(classification = factor(classification,
levels = c("Hot Spot with 99% Confidence",
"Hot Spot with 95% Confidence",
"Hot Spot with 90% Confidence",
"Not significant",
"Cold Spot with 90% Confidence",
"Cold Spot with 95% Confidence",
"Cold Spot with 99% Confidence")))
# View the hot spot analysis results (located in the last columns)
head(as.data.frame(hotspots_df[c(13:17,19)]))## SID79 GiZScore GiPValue NNeighbors mean_Gi_Bi
## 1 2 -2.2381623 0.025210472 27 -1
## 2 2 -1.5080110 0.131551715 33 0
## 3 5 0.7368177 0.461233235 48 0
## 4 7 0.6518782 0.514479746 41 0
## 5 9 2.6383697 0.008330570 45 2
## 6 20 2.7919482 0.005239174 51 2
## classification
## 1 Cold Spot with 90% Confidence
## 2 Not significant
## 3 Not significant
## 4 Not significant
## 5 Hot Spot with 95% Confidence
## 6 Hot Spot with 95% Confidence
Since the “grid” I used for the spatial join only contains the boundaries for the counties of North Carolina, I am going to add a layer that includes the surrounding states to provide more context. Similar shapefiles of the United States can be found on the United States Census Bureau’s website. Note that you may need to adjust some parameters in the following code block to match the attributes in your data set.
We are also going to make the map easier to read by adding a color
palette with reds for “hot spots” and blues for “cold spots”. I am using
ggplot2 for mapping here, but you can use another mapping
package such as leaflet if you’d like.
library(ggplot2)
# Create a corresponding, diverging color palette
colpal <- c("#b2182b", "#ef8a62", "#fddbc7", "#f7f7f7", "#d1e5f0", "#67a9cf", "#2166ac")
# Specify the levels of classification for use in the plot
gilevels <- c("Hot Spot with 99% Confidence",
"Hot Spot with 95% Confidence",
"Hot Spot with 90% Confidence",
"Not significant",
"Cold Spot with 90% Confidence",
"Cold Spot with 95% Confidence",
"Cold Spot with 99% Confidence")
# Load shapefile of the United States
USA <- read_sf(dsn = "./shapefiles", layer = "CB_USStates_WGS84")
# Create a bounding box of North Carolina to adjust the view of the map
bbox_usa <- USA %>%
filter(NAME == "North Carolina") %>%
st_bbox() %>%
as.list()
# Plot the data using ggplot
hotspot.map <- ggplot() +
geom_sf(data = USA,
fill = "palegreen4",
colour = "black") +
geom_sf(data = hotspots_df, aes(fill = classification),
color = "black",
lwd = 0.1) +
scale_fill_manual(name = "Hot Spots",
values = colpal,
breaks = gilevels) +
coord_sf(xlim = c(bbox_usa$xmin, bbox_usa$xmax),
ylim = c(bbox_usa$ymin, bbox_usa$ymax)) +
theme_classic() +
theme(panel.background = element_rect(fill = "white"),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "bottom",
legend.title = element_text(size=8), legend.text = element_text(size=6),
legend.background = element_blank())
# Print the map
print(hotspot.map)The result should look something like this:
The finished map
can be exported as a .png file, pasted in an
RMarkdown
document, integrated in an officer report, and more. You
can use the same steps in a loop, since the ArcGIS Pro files will
overwrite (based on the setup parameters), creating multiple maps at
once for use in reports or other projects.
Inevitably, the bridge will break at some point in time. These
failures are usually due to incompatible versions or new updates
available in ArcGIS Pro. Occasionally, the parameters for a tool will
change as well. Start by ensuring that you have an active ArcGIS Pro
license. Update your ArcGIS Pro software if it needs to be updated.
Check your interpreter to ensure that the path to “arcgispro-py3” is
still viable. If you cloned your active environment in ArcGIS Pro to use
as the interpreter, try using the default active environment. Check the
documentation for running the tools in Python by visiting each tool’s
ArcGIS webpage and looking through the Python Parameters box. Check if
the reticulate package in R needs updating. As a last
resort, update your version of R and/or RStudio if need be. If the
problems persist to no avail, consult different forums, github, or
Esri.
This tutorial was created by a non-expert for educational purposes only. While efforts have been made to ensure the accuracy of the information provided, users should verify any information before applying it to their own analysis. The author accepts no responsibility or liability for any errors or omissions, or for any damage or loss arising from the use of this tutorial.