Using ArcGIS Pro Tools in R with Python

Introduction

With over 18,000 packages documented in CRAN, it hard to find a reason to use another piece of software other than R to complete statistical analyses, including spatial ones. It is true that most spatial analyses one may desire to use can be done solely in R with packages such as sp, sf, raster, rasterVis, gstat, maptools, spatstat, and many others, but there may be some analyses or tools that a user may be more familiar with in the ArcGIS setting, or that they simply cannot easily find within an R package already. For those that are having a hard time giving up their proprietary software but only want to work within one program, I will demonstrate how to take advantage of the R-Python-ArcGIS bridge. This document will provide instructions for working with the bridge, which is necessary to perform the functions of toolboxes in ArcGIS Pro within R by integrating R scripts with Python.

Although there is an R package that creates a bridge with ArcGIS already (arcgisbinding), this package does not provide access to toolboxes, and was created for those that want to transfer data from ArcGIS to R, utilize the statistical analysis functionalities of R, then transfer the data back to ArcGIS Pro for mapping, publishing, and sharing. You can learn more about arcgisbinding here. The bridge we will be working with is the opposite of that described by the arcgisbinding package. To crudely describe the process, we will be starting with data in R, “sending” it to ArcGIS to analyze, then transferring it back to R for further analyses, mapping, and reporting.

As an example for taking advantage of the R-Python-ArcGIS bridge, we will be creating hot spot maps with the “Optimized Hot Spot Analysis” tool documented in ArcGIS Pro. The tool identifies statistically significant spatial clusters of high values (hot spots) and low values (cold spots). It automatically aggregates incident data, identifies an appropriate scale of analysis, and corrects for both multiple testing and spatial dependence. In short, the tool determines settings that will yield optimal results for you. To read more about the “Optimized Hot Spot Analysis” tool and how it works, visit the ArcGIS Pro website.

Software and Package Requirements

The bridge I describe in this document is fragile. Through trial and error, I have discovered that there are many opportunities for the tools to fail if conditions are not met perfectly. Therefore, I will be providing the exact versions of the software and packages I used for the bridge to work on my local machine but acknowledge that it may not work on another user’s machine. Because ArcGIS Pro is a proprietary software, you will need to purchase a license before installing it. For the sake of documentation, I am using an “Advanced” license in this example, but a “Basic” or “Standard” license should also work just fine.

To follow along exactly with the instructions provided, you must use the following softwares and versions:

R 4.2.3
RStudio 2023.12
ArcGIS Pro 3.2.1 (active license required)

You may note that I did not list Python as a software needed to run the tools, although Python 3.9 is indeed used. ArcGIS Pro comes with Python when it is installed and is the primary language for automation in ArcGIS Pro, so it does not need to be installed separately. All of the tools are run in Python, and we will be using ArcGIS Pro’s Python interpreter inside of R to access the arcpy package, which contains the methods for tools. It is of note that arcpy cannot be used unless there is an active license and downloaded copy of ArcGIS Pro on the machine running the scripts.

Please note that you must be familiar with R, Python, and ArcGIS Pro to follow along with complete understanding.

Setting up ArcGIS Pro

For the R project in which you plan on utilizing the bridge, I recommend creating a new project file (.aprx) in ArcGIS Pro that will be entirely dedicated to your R project. In this example, I will be calling mine “Working Bridge.aprx”. I do not recommend using an already established project file, as it will be harder to debug your script if any settings are accidentally changed that may affect how tools run. I would next recommend opening a new map in your ArcGIS Pro project for your own use in testing and debugging.

We will now need to gain access to the Python interpreter in order to run the arcpy module. The interpreter is the python.exe application inside of ArcGIS Pro. Instead of searching through your file browser, open the Python window inside of the ArcGIS Pro application. If you are unsure of where to locate this window, navigate to the “Analysis” header. In the “Geoprocessing” section, click on the Python button and select “Python Window” from the dropdown menu. In the Python window that pops up, type:

sys.prefix

What is returned is a path where you can find the Python application. For example, the file that is returned to me is:

C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3

The application should be in the “arcgispro-py3” folder:

C:/Program Files/ArcGIS/Pro/bin/Python/envs/arcgispro-py3/python.exe

Copy this path, but your version of it.

Setting up R

Back in RStudio, we need to apply the interpreter. At the top of the window, select “Tools” then “Global Options…”. On the left-hand side, select “Python”. In the “Python interpreter” input box, paste the path to the Python application that you copied from the previous steps. You may need to remove the quotation marks that might be pasted along with the path. The interpreter I used at the time of writing is a Conda environment, Python version 3.9.18. Next, press “Apply”. Your R session will have to restart. You can check that this setting was saved by going back to the page where you entered the path and ensuring it is still in the box.

Pre-processing the data

Complete any steps needed to pre-process your data before writing it to a .csv file.

I am keeping all of the non-ArcGIS Pro files used inside of my working directory for easy access.

For the example, I will be using spatial data included with with the R package spData.

library(spData)

# Import the data set and take a look at the attributes
data <- spData::nc.sids
head(data)

##             CNTY.ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 east north      x
## Ashe           1825  1091     1      10  1364     0      19  164   176 -81.67
## Alleghany      1827   487     0      10   542     3      12  183   182 -50.06
## Surry          1828  3188     5     208  3616     6     260  204   174 -16.14
## Currituck      1831   508     1     123   830     2     145  461   182 406.01
## Northampton    1832  1421     9    1066  1606     3    1197  385   176 281.10
## Hertford       1833  1452     7     954  1838     5    1237  411   176 323.77
##                   y       lon      lat L.id M.id
## Ashe        4052.29 -81.48594 36.43940    1    2
## Alleghany   4059.70 -81.14061 36.52443    1    2
## Surry       4043.76 -80.75312 36.40033    1    2
## Currituck   4035.10 -76.04892 36.45655    1    4
## Northampton 4029.75 -77.44057 36.38799    1    4
## Hertford    4028.10 -76.96474 36.38189    1    4

The nc.sids data set contains data on sudden infant deaths in North Carolina. We will be using the “SID79” attribute in our analysis. This attribute describes the number of deaths due to sudden infant death syndrome from 1979-1984 in each county.

# Complete any pre-processing

# Write the data frame to a .csv file within the ArcGIS Pro project file
write.csv(data, 'C:/Users/Path/To/Your/Project/File/Working Bridge/data.csv')

Running the tool with Python

The next block of code uses the reticulate package and the py_run_string() function to evaluate Python code.

I would like to note that there are other ways to run strings of Python within R, however this way has been working for me and allows me to see each line more clearly than some of the other ways of running Python in R. If you do not need to run large amounts of Python, this method works just fine.

The “grid” I am using to join the data to is a shapefile containing the boundaries to all of North Carolina’s counties. You can download the files from NC OneMap. Each point will be joined to each county based on their location. If there is more than one point in a certain county, the average of the Gi score will be taken in order to classify that county’s hot spot status. If there is no data reported for a county, it will not be included in the final shapefile of the hot spots.

Note that you can use any kind of neighboring polygon shapefile as your grid. For evenly spaced square or hexagon-shaped grids, I would recommend creating one in ArcGIS Pro using the “Create Fishnet” Geoprocessing tool.

library(reticulate)

py_run_string("import arcpy")

# Set environment settings (e.g., workspace, output coordinate system, allow file overwriting)
py_run_string(
  "arcpy.env.workspace = 'C:/Users/Path/To/Your/Project/File/Working Bridge'
   arcpy.env.outputCoordinateSystem = arcpy.SpatialReference(4326)
   arcpy.env.overwriteOutput = True"
  )

# Import data and convert table to .shp file
py_run_string(
  "input_csv = 'C:/Users/Path/To/Your/Project/File/Working Bridge/data.csv'
   output_feature_class = 'point_data'
   arcpy.management.XYTableToPoint(input_csv, output_feature_class, 'lon', 'lat')"
  )

# Define input path, output path, and analysis field for the hotspot tool
py_run_string(
  "input_feature_class = 'point_data.shp'
   output_hotspot_result = 'hotspot_point.shp'
   analysis_field = 'SID79'"
  )

# Run Optimized Hot Spot Analysis tool
py_run_string(
  "arcpy.stats.OptimizedHotSpotAnalysis(
   input_feature_class, 
   output_hotspot_result, 
   Analysis_Field = analysis_field, 
   Incident_Data_Aggregation_Method = 'COUNT_INCIDENTS_WITHIN_FISHNET_POLYGONS')"
  )

# Add input/outputs of the grid and points for the spatial join tool
py_run_string(
  "nc_grid_shp = 'C:/Users/Path/To/Your/Project/File/Working Bridge/North_Carolina_State_and_County_Boundary_Polygons.shp'
   target_features = 'North_Carolina_State_and_County_Boundary_Polygons.shp'
   join_features = 'hotspot_point.shp'
   out_feature_class = 'joinedgrid.shp'"
  )

# Create field mappings for the spatial join 
py_run_string(
  "fieldmappings = arcpy.FieldMappings()
   fieldmappings.addTable(target_features)
   fieldmappings.addTable(join_features)
   GiBinFieldIndex = fieldmappings.findFieldMapIndex('Gi_Bin')
   fieldmap = fieldmappings.getFieldMap(GiBinFieldIndex)
   field = fieldmap.outputField
   field.name = 'mean Gi Bin'
   field.aliasName = 'mean Gi Bin'
   fieldmap.outputField = field
   fieldmap.mergeRule = 'mean'
   fieldmappings.replaceFieldMap(GiBinFieldIndex, fieldmap)"
  )

# Run the spatial join tool
py_run_string(
  "arcpy.analysis.SpatialJoin(
   target_features, 
   join_features, 
   out_feature_class, 
   field_mapping = fieldmappings, 
   match_option = 'COMPLETELY_CONTAINS')"
  )

# Only keep cells that contain data
py_run_string(
  "input_shapefile = 'joinedgrid.shp'
   output_shapefile = 'selectedgrid.shp'
   expression = 'Join_Count > 0'
   arcpy.Select_analysis(input_shapefile, output_shapefile, expression)"
  )

Mapping the results in R

In the results of the next code chunk, you can see that after running the hot spot analysis, ArcGIS Pro added some additional attributes, including the Gi z-score, Gi p-value, and Gi bin for each cell. The bin number is assigned to each cell based on the values of the z-score and p-value. For example, a cell with a positive z-score and a p-value less than or equal to 0.01 is assigned to bin 3, which we classify as a hot spot with 99% confidence. The z-score controls the “hot” or “cold” designation, while the the p-value denotes the significance.

library(dplyr)
library(sf)

# Load the hot spot shapefile made in ArcGIS Pro
hotspots_grid <- read_sf(dsn = 'C:/Users/Path/To/Your/Project/File/Working Bridge', 
                         layer = "selectedgrid")

# Create custom classifications based on the binned Gi scores
hotspots_df <- hotspots_grid %>%
  mutate(classification = case_when(
    mean_Gi_Bi == 3 ~ "Hot Spot with 99% Confidence",   
    mean_Gi_Bi == 2 ~ "Hot Spot with 95% Confidence",   
    mean_Gi_Bi == 1 ~ "Hot Spot with 90% Confidence",    
    mean_Gi_Bi == -3 ~ "Cold Spot with 99% Confidence",  
    mean_Gi_Bi == -2 ~ "Cold Spot with 95% Confidence",  
    mean_Gi_Bi == -1 ~ "Cold Spot with 90% Confidence",   
    mean_Gi_Bi == 0 ~ "Not significant"
  )) %>%
  mutate(classification = factor(classification, 
                                 levels = c("Hot Spot with 99% Confidence", 
                                            "Hot Spot with 95% Confidence", 
                                            "Hot Spot with 90% Confidence", 
                                            "Not significant", 
                                            "Cold Spot with 90% Confidence", 
                                            "Cold Spot with 95% Confidence", 
                                            "Cold Spot with 99% Confidence")))

# View the hot spot analysis results (located in the last columns)
head(as.data.frame(hotspots_df[c(13:17,19)]))

##   SID79   GiZScore    GiPValue NNeighbors mean_Gi_Bi
## 1     2 -2.2381623 0.025210472         27         -1
## 2     2 -1.5080110 0.131551715         33          0
## 3     5  0.7368177 0.461233235         48          0
## 4     7  0.6518782 0.514479746         41          0
## 5     9  2.6383697 0.008330570         45          2
## 6    20  2.7919482 0.005239174         51          2
##                  classification
## 1 Cold Spot with 90% Confidence
## 2               Not significant
## 3               Not significant
## 4               Not significant
## 5  Hot Spot with 95% Confidence
## 6  Hot Spot with 95% Confidence

Since the “grid” I used for the spatial join only contains the boundaries for the counties of North Carolina, I am going to add a layer that includes the surrounding states to provide more context. Similar shapefiles of the United States can be found on the United States Census Bureau’s website. Note that you may need to adjust some parameters in the following code block to match the attributes in your data set.

We are also going to make the map easier to read by adding a color palette with reds for “hot spots” and blues for “cold spots”. I am using ggplot2 for mapping here, but you can use another mapping package such as leaflet if you’d like.

library(ggplot2)

# Create a corresponding, diverging color palette
colpal <- c("#b2182b", "#ef8a62", "#fddbc7", "#f7f7f7", "#d1e5f0", "#67a9cf", "#2166ac")

# Specify the levels of classification for use in the plot
gilevels <- c("Hot Spot with 99% Confidence", 
              "Hot Spot with 95% Confidence", 
              "Hot Spot with 90% Confidence", 
              "Not significant", 
              "Cold Spot with 90% Confidence", 
              "Cold Spot with 95% Confidence", 
              "Cold Spot with 99% Confidence")

# Load shapefile of the United States
USA <- read_sf(dsn = "./shapefiles", layer = "CB_USStates_WGS84")

# Create a bounding box of North Carolina to adjust the view of the map
bbox_usa <- USA %>%
  filter(NAME == "North Carolina") %>%
  st_bbox() %>%
  as.list()

# Plot the data using ggplot
hotspot.map <- ggplot() +
  geom_sf(data = USA,
          fill = "palegreen4",
          colour = "black") +
  geom_sf(data = hotspots_df, aes(fill = classification),
          color = "black", 
          lwd = 0.1) +
  scale_fill_manual(name = "Hot Spots", 
                    values = colpal, 
                    breaks = gilevels) +
  coord_sf(xlim = c(bbox_usa$xmin, bbox_usa$xmax),
           ylim = c(bbox_usa$ymin, bbox_usa$ymax)) +
  theme_classic() +
  theme(panel.background = element_rect(fill = "white"),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "bottom",
        legend.title = element_text(size=8), legend.text = element_text(size=6),
        legend.background = element_blank())

# Print the map
print(hotspot.map)

The result should look something like this:

The finished map can be exported as a .png file, pasted in an RMarkdown document, integrated in an officer report, and more. You can use the same steps in a loop, since the ArcGIS Pro files will overwrite (based on the setup parameters), creating multiple maps at once for use in reports or other projects.

Debugging

Inevitably, the bridge will break at some point in time. These failures are usually due to incompatible versions or new updates available in ArcGIS Pro. Occasionally, the parameters for a tool will change as well. Start by ensuring that you have an active ArcGIS Pro license. Update your ArcGIS Pro software if it needs to be updated. Check your interpreter to ensure that the path to “arcgispro-py3” is still viable. If you cloned your active environment in ArcGIS Pro to use as the interpreter, try using the default active environment. Check the documentation for running the tools in Python by visiting each tool’s ArcGIS webpage and looking through the Python Parameters box. Check if the reticulate package in R needs updating. As a last resort, update your version of R and/or RStudio if need be. If the problems persist to no avail, consult different forums, github, or Esri.

Disclaimer

This tutorial was created by a non-expert for educational purposes only. While efforts have been made to ensure the accuracy of the information provided, users should verify any information before applying it to their own analysis. The author accepts no responsibility or liability for any errors or omissions, or for any damage or loss arising from the use of this tutorial.