Background
My lab (aka me and my advisor T. Gillespie) is collaborating with two LA-based projects (at the City and County) that are interested in using local species data to assess biodiversity/sustainability. Both projects want to use the presence of certain indicator species to reflect, to some extent, variables like habitat quality and species richness. A set of (as yet unfinalized) calculations will convert the species data to an area-based “score,” which will then be added up with several other variables/indicators.
The availability of species data is obviously key here. Because species occurrence data obtained through routine, controlled monitoring are very difficult to come by, these projects are interested in the utility of citizen science inventories like iNaturalist. Citizen science databases are extremely valuable, as they offer access to the pooled efforts and observations of many more people (a lot more frequently) than you might have otherwise. The disadvantages, however, include some degree of spatial sampling bias among the observations and potentital inaccuracies in species identifications.
We’ve been asked to help evaluate the utility and availability of iNaturalist data for 119 indicator species. We need to provide the number of accurate, LA city records for each species, and simple maps of their distributions. Once we review the coverage of the iNaturalist data with our collaborators, we’ll use this data to score LA’s non-urban patches and urban areas. Scoring methods are still being finalized, but we expect them to vary by taxa and indicator species group. There are 5 groups in total: common native fauna (found mostly in “natural” areas), common native flora found mostly in “natural” areas), native fauna that can commonly be found in urban areas, rare native species of concern, and invasive species.
Objectives
Write a script that:
-
Filters through the iNaturalist data,
-
Exports a clean .csv and .shp for each species,
-
Produces some simple maps,
-
Calculates area-based scores for different groups of species, and
-
Can be easily updated to reflect changes in the indicator species lists, units of analysis, and scoring methods
Main Steps
I am dividing the procedures into two parts. Part one (steps 1-9) will clean and visualize the data on a species-by-species basis. Part two includes the steps for scoring.
-
Identify and select records that: a) have accuracy radius <= 10m and b) Match the listed indicator species/subspecies (unless multiple species are to be used as indicators)
-
Export columns for species scientific name, common name, longitude, latitude, observation date, and iNat user info to CSV
-
Export shapefile from iNat records CSV
-
Create a subset of records using the LA City boundary
-
Count number of species records in city (total), in the patch area, and in the urban area, and write to table
-
Identify patches and/or grids that contain species points
-
Export patches and/or grids to new shapefiles
-
Update table with number of patches and urban grids that contain species records
-
Map a) species point records and b) patches/grids containing iNat data
-
Using Biodiversity Index methods, calculate Indicator score for each species group
-
Map Indicator scores
Getting Started
library(tidyverse)
library(sp)
library(rgdal)
library(rgeos)
library(raster)
library(ggplot2)
library(tmap)
Should I need to re-run this script in future, the only thing I want to have to update is this line, setting the work directory:
setwd('C:/Users/monica/Documents/Bio_Index/species_data')
One important input file is this sort of reference list, which contains some information about the indicator species. This too could potentially be updated in future. Let’s take a peek:
ind_spp <- read_csv('indicator_species_fn.csv')
Parsed with column specification:
cols(
indicator_group = col_character(),
taxa = col_character(),
scientific_name = col_character(),
common_name = col_character(),
multiple_spp = col_logical(),
ranking = col_character(),
csv_fn = col_character()
)
head(ind_spp)
# A tibble: 6 x 7
indicator_group taxa scientific_name common_name multiple_spp ranking csv_fn
<chr> <chr> <chr> <chr> <lgl> <chr> <chr>
1 common_native_flora Plant Abronia umbellata Sand Verbena FALSE 2 a b c Abronia_umbellata
2 common_native_flora Plant Artemisia californica California sagebrush FALSE 2 a b c Artemisia_californica
3 common_native_flora Plant Baccharis pilularis Coyote brush FALSE 2 a b c Baccharis_pilularis
4 common_native_flora Plant Baccharis salicifolia Mulefat FALSE 2 a b c Baccharis_salicifolia
5 common_native_flora Plant Camissoniopsis cheiranthifolia Beach evening-primrose FALSE 2 a b c Camissoniopsis_cheiranthifolia
6 common_native_flora Plant Castilleja exserta Purple owls-clover FALSE 2 a b c Castilleja_exserta
# How many unique values are there for each variable?
sapply(ind_spp, function(x) length(unique(x)))
indicator_group taxa scientific_name common_name multiple_spp ranking csv_fn
5 7 119 119 2 7 117
# How many species are in each indicator group?
spp_per_group <- ind_spp %>%
group_by(indicator_group) %>%
summarise(species = n())
print(spp_per_group)
# A tibble: 5 x 2
indicator_group species
<chr> <int>
1 common_native_fauna 25
2 common_native_flora 25
3 common_native_urban 15
4 invasive 19
5 rare_native 35
The other inputs are some shapefiles for LA City. I’ll also assign their coordinate system (WGS 84) to a variable for later use:
LAcity_boundary <- readOGR('la_shp', 'la_city_griddissolve_wgs84')
cv_patch <- readOGR('la_shp', 'calveg_patch_wgs84')
ctg_patch <- readOGR('la_shp', 'contig_patch_wgs84')
urbn_grid <- readOGR('la_shp', 'urbn_grid_wgs84')
wgs84crs <- crs(LAcity_boundary)
We’re all set up! And now that we have the shapefiles, I can show you what the extent of our study area, and the habitat “patches”, look like:
tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4", title = "Urban area") +
tm_shape(cv_patch) +
tm_fill(col = "SI_CLASS",
palette = c("#f8da19","#ffc3af","#8bc34a","#3a554b","#970707","#de3c3c","#68bde1"),
title = "Singapore Index Class") +
tm_layout(main.title = "City of Los Angeles",
main.title.size = 0.95,
frame = FALSE,
inner.margins = 0.02,
legend.text.size = 0.55,
legend.title.size = 0.9,
legend.width = 1.5,
legend.outside = TRUE)

Part One
There are three main stages in Part One:
-
Steps 1-5, in which we will clean, filter, count, and export the records in the iNaturalist .csv’s that we (aka someone in the Grand Challenges office) have downloaded
-
Steps 6-8, in which we subset the species records using patch and urban grid polygons, count up the number of occupied patches/grids, and write those counts to a table
-
Step 9, in which we finally make some maps!
I’ll start by listing the species in each indicator group (these groups are used to organize the downloaded iNaturalist .csv’s and output files). We’ll also add new columns to the ind_spp dataframe, which will eventually be populated with the number of records in each shapefile we export.
ind_spp_groups <- list.dirs('iNat_downloads', full.names = FALSE, recursive = FALSE)
ind_spp$in_city <- 0
ind_spp$in_patches <- 0
ind_spp$in_urban <- 0
ind_spp$cvpatch_count <- 0
ind_spp$ctgpatch_count <- 0
ind_spp$urbn_count <- 0
The main component here is the big for loop, which I will not run in this R Notebook because 1) I have already exported all the files and 2) it really wouldn’t show you a very interesting output. But I’ve detailed each step of the loop in the comments. Loops are not as scary as I once told myself they were!
# Loop through the iNat downloads, one indicator_group at a time
for(ind_group in ind_spp_groups) {
# List the species in each indicator group (using the iNaturalist downloads)
iNat_csv_list <- list.files(paste0('iNat_downloads/',ind_group, collapse=""))
for(species in iNat_csv_list) {
# Read in each species csv; list the variables we will keep upon export
iNat_pts <- read_csv(paste0('iNat_downloads/',ind_group,'/',species, collapse=""))
species_nm <- gsub(".csv", "", species)
keep_cols <- c('scientific_name', 'common_name', 'longitude', 'latitude', 'positional_accuracy', 'observed_on', 'user_id', 'user_login')
# Find the row in ind_spp that lists the current species' details
refrow <- ind_spp %>%
filter(csv_fn == species_nm)
# Using the reference row, determine whether we want to filter for positional accuracy AND a specific species, or for positional accuracy alone.
if(refrow$multiple_spp == FALSE) {
new_csv <- iNat_pts %>%
filter((positional_accuracy <= 10) & (positional_accuracy >= 0)) %>%
filter(scientific_name == refrow$scientific_name) %>%
dplyr::select(keep_cols)
} else {
new_csv <- iNat_pts %>%
filter((positional_accuracy <= 10) & (positional_accuracy >= 0)) %>%
dplyr::select(keep_cols)
}
# Does the species have any accurate records to export?
if((nrow(new_csv)) > 0) {
# If yes, write new csv...
write_csv(new_csv, path = paste0('iNat_cleaned_csv/',ind_group,'/',species, collapse=""))
# Create a new folder for the shapefile, if it does not already exist (we will later create additional shapefiles for each species, so want individual folders)
dir.create(file.path(paste0('species_shp/',ind_group,'/',species_nm)), showWarnings = FALSE)
# Create shapefile from csv - unfortunately, the shapefile() function abbreviates the field names (7 characters max?!?!!? oh well)
sp_xy <- cbind(new_csv$longitude, new_csv$latitude)
sp_pts_all <- SpatialPointsDataFrame(sp_xy,
new_csv,
proj4string = wgs84crs)
shapefile(sp_pts_all, paste0('species_shp/',ind_group,'/',species_nm,'/',species, collapse=""), overwrite = TRUE)
rm(sp_xy)
# Clip spatial object to LA City boundary and print number of records that fall within LA city, the patches, and the urban area to original csv.
# NOTE ON (gIntersection): after much moanin' and groanin', I found that you can SUBSET SPATIALPOINTSDATAFRAMES. NO NEED FOR GINTERSECTS, FOR OVERLAYS, FOR NOTHIN.
sp_pts_LA <- sp_pts_all[LAcity_boundary, ]
ind_spp$in_city <- ifelse(ind_spp$csv_fn == species_nm,
length(sp_pts_LA),
ind_spp$in_city)
sp_pts_patch <- sp_pts_all[ctg_patch, ]
ind_spp$in_patches <- ifelse(ind_spp$csv_fn == species_nm,
length(sp_pts_patch),
ind_spp$in_patches)
sp_pts_urbn <- sp_pts_all[urbn_grid, ]
ind_spp$in_urban <- ifelse(ind_spp$csv_fn == species_nm,
length(sp_pts_urbn),
ind_spp$in_urban)
} else {next}
# Steps 6-8 = Subset species records using patch and urban grid polygons, write counts to table
if(length(sp_pts_LA) > 0) {
if((length(sp_pts_patch) > 0) & (length(sp_pts_urbn) > 0)) {
ctg_sub <- ctg_patch[sp_pts_LA, ]
shapefile(ctg_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_ctg', collapse=""), overwrite = TRUE)
ind_spp$ctgpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
length(ctg_sub),
ind_spp$ctgpatch_count)
cv_sub <- cv_patch[sp_pts_LA, ]
shapefile(cv_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_cv', collapse=""), overwrite = TRUE)
ind_spp$cvpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
length(cv_sub),
ind_spp$cvpatch_count)
urbn_sub <- urbn_grid[sp_pts_LA, ]
shapefile(urbn_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_urbn', collapse=""), overwrite = TRUE)
ind_spp$urbn_count <- ifelse(ind_spp$csv_fn == species_nm,
length(urbn_sub),
ind_spp$urbn_count)
} else if ((length(sp_pts_patch) > 0) & (length(sp_pts_urbn) == 0)) {
ctg_sub <- ctg_patch[sp_pts_LA, ]
shapefile(ctg_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_ctg', collapse=""), overwrite = TRUE)
ind_spp$ctgpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
length(ctg_sub),
ind_spp$ctgpatch_count)
cv_sub <- cv_patch[sp_pts_LA, ]
shapefile(cv_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_cv', collapse=""), overwrite = TRUE)
ind_spp$cvpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
length(cv_sub),
ind_spp$cvpatch_count)
} else {
urbn_sub <- urbn_grid[sp_pts_LA, ]
shapefile(urbn_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_urbn', collapse=""), overwrite = TRUE)
ind_spp$urbn_count <- ifelse(ind_spp$csv_fn == species_nm,
length(urbn_sub),
ind_spp$urbn_count)
}
# Step 9 = Map 1) species point records and 2) presence across patches/grids
# Compose map title
common <- refrow$common_name
scientific <- refrow$scientific_name
map_title <- paste0(common, "\n(",scientific, ")") #After much quality Google time, it seems that italicizing part of a text element in tmap can't be done. Normally, one would wrap the to-be-italicized text so: expression(italic(sci_name))), but tmap does not accept expressions.
# Create point records map
point_map <- tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4") +
tm_shape(ctg_patch) +
tm_fill(col = "#D8D8CD") +
tm_shape(sp_pts_LA) +
tm_symbols(col = "#00DFB3", border.col = "#ffffff", border.lwd = 0.3, scale = 0.3, shape = 21) +
tm_layout(main.title = map_title,
title = paste("iNaturalist Records in\nCity of LA =", length(sp_pts_LA)),
title.size = 0.9,
title.position = c("left", "bottom"),
frame = FALSE,
inner.margins = 0.06)
# For presence map, choose between species contiguous and Calveg patch files, depending on taxa
if(refrow$taxa == "Mammal") {
poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4") +
tm_shape(ctg_patch) +
tm_fill(col = "#D8D8CD") +
tm_shape(ctg_sub) +
tm_fill(col = "#00DFB3") +
tm_shape(urbn_sub) +
tm_fill(col = "#00DFB3") +
tm_layout(main.title = " ",
title = "Presence by Contiguous Patch/\nUrban Grid (0.25 mi)",
title.size = 0.9,
title.position = c("left", "bottom"),
frame = FALSE,
inner.margins = 0.06)
} else {
poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4") +
tm_shape(ctg_patch) +
tm_fill(col = "#D8D8CD") +
tm_shape(cv_sub) +
tm_fill(col = "#00DFB3") +
tm_shape(urbn_sub) +
tm_fill(col = "#00DFB3") +
tm_layout(main.title = " \n ",
title = "Presence by CalVeg Patch/\nUrban Grid (0.25 mi)",
title.size = 0.9,
title.position = c("left", "bottom"),
frame = FALSE,
inner.margins = 0.06)
}
# Export a PDF with the point and polygon maps
pdf(paste0('pdf_previews/',ind_group,'/',species_nm,'.pdf'), paper = "USr", width = 10.3, height = 7.8)
print(tmap_arrange(point_map, poly_map))
dev.off()
} else {next}
# Write updated ind_spp to a new csv
write_csv(ind_spp, path = 'indicator_species_fn_updated.csv')
}
}
Part One Results
Since I’m not running that main chunk of Part One, I’ll demonstrate what the PDF output should look like with one of my favorite species, the red-tailed hawk. Some of the objects have been re-specified, as all of the shapefiles are already ready to go. Here’s a shot of them hanging out in their folder:
library(knitr)
include_graphics('hawk_preview.png')

redtail_pts <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis')
OGR data source with driver: ESRI Shapefile
Source: "C:\Users\monica\Documents\Bio_Index\species_data\species_shp\common_native_fauna\Buteo_jamaicensis", layer: "Buteo_jamaicensis"
with 916 features
It has 8 fields
ctg_sub <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis_ctg')
OGR data source with driver: ESRI Shapefile
Source: "C:\Users\monica\Documents\Bio_Index\species_data\species_shp\common_native_fauna\Buteo_jamaicensis", layer: "Buteo_jamaicensis_ctg"
with 22 features
It has 1 fields
cv_sub <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis_cv')
OGR data source with driver: ESRI Shapefile
Source: "C:\Users\monica\Documents\Bio_Index\species_data\species_shp\common_native_fauna\Buteo_jamaicensis", layer: "Buteo_jamaicensis_cv"
with 45 features
It has 1 fields
urbn_sub <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis_urbn')
OGR data source with driver: ESRI Shapefile
Source: "C:\Users\monica\Documents\Bio_Index\species_data\species_shp\common_native_fauna\Buteo_jamaicensis", layer: "Buteo_jamaicensis_urbn"
with 41 features
It has 1 fields
sp_pts_LA <- redtail_pts[LAcity_boundary, ]
refrow <- ind_spp %>%
filter(csv_fn == "Buteo_jamaicensis")
common <- refrow$common_name
scientific <- refrow$scientific_name
map_title <- paste0(common, "\n(",scientific, ")")
# Create point records map
point_map <- tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4") +
tm_shape(ctg_patch) +
tm_fill(col = "#D8D8CD") +
tm_shape(sp_pts_LA) +
tm_symbols(col = "#00DFB3", border.col = "#ffffff", border.lwd = 0.3, scale = 0.3, shape = 21) +
tm_layout(main.title = map_title,
title = paste("iNaturalist Records in\nCity of LA =", length(sp_pts_LA)),
title.size = 0.9,
title.position = c("left", "bottom"),
frame = FALSE,
inner.margins = 0.06)
# For presence map, choose between species contiguous and Calveg patch files, depending on taxonomic group
if(refrow$taxa == "Mammal") {
poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4") +
tm_shape(ctg_patch) +
tm_fill(col = "#D8D8CD") +
tm_shape(ctg_sub) +
tm_fill(col = "#00DFB3") +
tm_shape(urbn_sub) +
tm_fill(col = "#00DFB3") +
tm_layout(main.title = " ",
title = "Presence by Contiguous Patch/\nUrban Grid (0.25 mi)",
title.size = 0.9,
title.position = c("left", "bottom"),
frame = FALSE,
inner.margins = 0.06)
} else {
poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
tm_fill(col = "#f0f0e4") +
tm_shape(ctg_patch) +
tm_fill(col = "#D8D8CD") +
tm_shape(cv_sub) +
tm_fill(col = "#00DFB3") +
tm_shape(urbn_sub) +
tm_fill(col = "#00DFB3") +
tm_layout(main.title = " \n ",
title = "Presence by CalVeg Patch/\nUrban Grid (0.25 mi)",
title.size = 0.9,
title.position = c("left", "bottom"),
frame = FALSE,
inner.margins = 0.06)
}
tmap_arrange(point_map, poly_map)

And as proof that the whole process worked - here are the common native fauna PDFs, all happily nestled away in the appropriate directory:
include_graphics('pdf_preview.png')

Let’s also take a look at some of the data that we added to the ind_spp dataframe using ggplot. A preview of those new columns:
ind_spp_2 <- read_csv('indicator_species_fn_updated.csv')
Parsed with column specification:
cols(
indicator_group = col_character(),
taxa = col_character(),
scientific_name = col_character(),
common_name = col_character(),
multiple_spp = col_logical(),
ranking = col_character(),
csv_fn = col_character(),
in_city = col_integer(),
in_patches = col_integer(),
in_urban = col_integer(),
cvpatch_count = col_integer(),
ctgpatch_count = col_integer(),
urbn_count = col_integer()
)
head(ind_spp_2)
# A tibble: 6 x 13
indicator_group taxa scientific_name common_name multiple_spp ranking csv_fn in_city in_patches in_urban cvpatch_count ctgpatch_count
<chr> <chr> <chr> <chr> <lgl> <chr> <chr> <int> <int> <int> <int> <int>
1 common_native_~ Plant Abronia umbell~ Sand Verbe~ FALSE 2 a b c Abron~ 6 5 1 1 1
2 common_native_~ Plant Artemisia cali~ California~ FALSE 2 a b c Artem~ 182 118 63 74 22
3 common_native_~ Plant Baccharis pilu~ Coyote bru~ FALSE 2 a b c Bacch~ 77 60 18 35 13
4 common_native_~ Plant Baccharis sali~ Mulefat FALSE 2 a b c Bacch~ 0 0 0 0 0
5 common_native_~ Plant Camissoniopsis~ Beach even~ FALSE 2 a b c Camis~ 2 1 1 1 1
6 common_native_~ Plant Castilleja exs~ Purple owl~ FALSE 2 a b c Casti~ 2 1 1 1 1
# ... with 1 more variable: urbn_count <int>
Which indicator species groups do we have the most data for?
available <- ind_spp_2 %>%
mutate(records = ifelse(in_city > 0, "Yes", "No"))
ggplot(available, aes(x = indicator_group, fill = records)) +
geom_bar(position = "fill", width = 0.27) +
geom_text(stat='count', aes(label=..count..), size = 3.5, position = position_fill(vjust = 0.5)) +
scale_fill_discrete(name = "Records available?") +
coord_flip() +
scale_x_discrete(labels = c("Common\nnative fauna", "Common\nnative flora", "Common\nurban sp.", "Invasive", "Rare native")) +
labs(title = "iNaturalist Records in City of LA", subtitle = "Availability by species; labels indicate number of species", x = "", y = "% of species") +
theme_minimal() +
theme(plot.title = element_text(vjust = 2, face = "bold"),
plot.subtitle = element_text(color = "gray60"),
axis.title.x = element_text(vjust = -1, color = "gray60"))

Availability is lower for rare natives and invasive species, which is not unexpected (some of the invasive species are relatively newer to the region or harder to observe). Which taxa are most represented in the zero records group? We’ll lump common native flora and fauna together here for a little more visual clarity.
no_la_records <- ind_spp_2 %>%
filter(in_city == 0) %>%
mutate(group_2 = indicator_group) %>%
mutate(group_2 = sub("_fauna", "_ff", group_2)) %>%
mutate(group_2 = sub("_flora", "_ff", group_2))
ggplot(no_la_records, aes(x = taxa, fill = group_2)) +
geom_bar(position = position_dodge2(preserve = "single")) +
coord_flip() +
scale_y_continuous(breaks = seq(0, 10, by = 2)) +
scale_fill_manual(values = c("#53d397","#cce490", "#ff5f5f","#fbd341"),
labels = c("Common native flora/fauna", "Common urban sp.", "Invasive", "Rare native"),
name = "Indicator\nspecies group") +
labs(title = "Which indicator species are missing iNaturalist data in the City of LA?",
subtitle = "By taxonomic group",
x = "", y = "No. of species") +
theme_minimal() +
theme(plot.title = element_text(vjust = 2, face = "bold"),
plot.subtitle = element_text(color = "gray60"),
axis.title.x = element_text(vjust = -1, color = "gray60"))

Invertebrates are the toughest species to get records for (1 out of 1 algae species also has no records) - surprise! People would much rather chronicle their encounters with bobcats than with crickets.
Of the species we do have records for, how many are being observed in the urban areas vs. habitat patches?
la_records <- available %>%
filter(records == "Yes") %>%
group_by(taxa) %>%
summarise(in_patches = sum(in_patches, na.rm = TRUE),
in_urban = sum(in_urban, na.rm = TRUE)) %>%
gather(key = "location", value = "records", 2:3)
ggplot(la_records, aes(x = taxa, y = records, fill = location)) +
geom_col(position = "fill", width = 0.3) +
geom_text(aes(label = records), size = 3.5, position = position_fill(vjust = 0.5)) +
scale_fill_manual(name = "",
labels = c("Habitat patches", "Urban areas"),
values = c("#7dc383", "#D8D8CD")) +
coord_flip() +
labs(title = "Where are species spotted in the City of LA?", subtitle = "Labels indicate total iNaturalist records per taxonomic group", x = "", y = "% of species") +
theme_minimal() +
theme(plot.title = element_text(vjust = 2, face = "bold"),
plot.subtitle = element_text(color = "gray60"),
axis.title.x = element_text(vjust = -1, color = "gray60"))

Interesting! Observers spot more mammals, invertebrates, and birds in urban areas than they do in habitat patches. There are a few reasonable explanations for this, one obviously being that 12% of the indicator species are in the “common native in urban area” group. Also, some mammals, like the invasive fox squirrel, truly are more common in urban areas. But for other species, this difference between urban and patch observations could be a reflection of sampling bias. Meanwhile, amphibians were more likely to be observed in habitat patches - probably something to do with their very specific habitat needs (I believe all of LA’s riparian resources are all classified as habitat patches). The low proportion of plants recorded in urban areas could be due to the dominance of non-native ornamentals in managed landscapes, or our exclusion of records of “captive” species, or both.
What’s Next
Part two is next! Once we have some scoring methods, I have a pretty good idea of how I might put together the script. The analysis itself probably won’t be so complex, but like the procedures in Part One, it will likely be tested and changed multiple times as the methods are developed. Writing the script so far has been very educational, and I think it will help the rest of the process go much smoother! Conversely, the more I work with the loops, the more I understand all the little places where the code can be pruned and streamlined. So I hope that working through the next part of this project will enable me to go back and improve the first half.
There are things to improve on the maps themselves as well. Certain issues that feel relatively small, but still escape me. These include italicizing just part of a title in a tmap object, and customizing a pdf layout that includes tmaps (e.g. adding a title and some text). I would really like to make the scored maps interactive - though I’m not totally sure how I would share them if I continue to use tmap (maybe just the HTML page?). That brings me to the next item on my wish list: expanding my mapping-in-R skillset beyond tmap. Thus far, it’s been my favorite because it so easily handles shapefiles. ggplot seems not so straightforward when it comes to polygons, and gganimate is intimidating. But tmap is not without its limitations, so I’m excited to try and wrangle the other methods!
All species data from iNaturalist. http://www.inaturalist.org
---
title: "Cleaning, Scoring, and Mapping iNaturalist Data"
subtitle: "Monica Dimson"
output: html_notebook
---

<b>Background</b>

My lab (aka me and my advisor T. Gillespie) is collaborating with two LA-based projects (at the City and County) that are interested in using local species data to assess biodiversity/sustainability. Both projects want to use the presence of certain indicator species to reflect, to some extent, variables like habitat quality and species richness. A set of (as yet unfinalized) calculations will convert the species data to an area-based "score," which will then be added up with several other variables/indicators.

The availability of species data is obviously key here. Because species occurrence data obtained through routine, controlled monitoring are very difficult to come by, these projects are interested in the utility of citizen science inventories like iNaturalist. Citizen science databases are extremely valuable, as they offer access to the pooled efforts and observations of many more people (a lot more frequently) than you might have otherwise. The disadvantages, however, include some degree of spatial sampling bias among the observations and potentital inaccuracies in species identifications. 

We've been asked to help evaluate the utility and availability of iNaturalist data for 119 indicator species. We need to provide the number of accurate, LA city records for each species, and simple maps of their distributions. Once we review the coverage of the iNaturalist data with our collaborators, we'll use this data to score LA's non-urban patches and urban areas. Scoring methods are still being finalized, but we expect them to vary by taxa and indicator species group. There are 5 groups in total: common native fauna (found mostly in "natural" areas), common native flora found mostly in "natural" areas), native fauna that can commonly be found in urban areas, rare native species of concern, and invasive species. 
 
<hr>
<b>Objectives</b>

Write a script that:
<ul>
<li>Filters through the iNaturalist data,</li>
<li>Exports a clean .csv and .shp for each species,</li>
<li>Produces some simple maps,</li>
<li>Calculates area-based scores for different groups of species, and</li>
<li>Can be easily updated to reflect changes in the indicator species lists, units of analysis, and scoring methods</li>
</ul>

<hr>
<b>Main Steps</b>

I am dividing the procedures into two parts. Part one (steps 1-9) will clean and visualize the data on a species-by-species basis. Part two includes the steps for scoring. 
<ol>
<li>Identify and select records that: a) have accuracy radius <= 10m and b) Match the listed indicator species/subspecies (unless multiple species are to be used as indicators)</li>
<li>Export columns for species scientific name, common name, longitude, latitude, observation date, and iNat user info to CSV</li>
<li>Export shapefile from iNat records CSV</li>
<li>Create a subset of records using the LA City boundary</li>
<li>Count number of species records in city (total), in the patch area, and in the urban area, and write to table</li>
<li>Identify patches and/or grids that contain species points</li>
<li>Export patches and/or grids to new shapefiles</li>
<li>Update table with number of patches and urban grids that contain species records</li>
<li>Map a) species point records and b) patches/grids containing iNat data</li>
<li>Using Biodiversity Index methods, calculate Indicator score for each species group</li>
<li>Map Indicator scores</li>
</ol>

<hr>
<b>Getting Started</b>


```{r}
library(tidyverse)
library(sp)
library(rgdal)
library(rgeos)
library(raster)
library(ggplot2)
library(tmap)
```

Should I need to re-run this script in future, the only thing I want to *have* to update is this line, setting the work directory: 

```{r}
setwd('C:/Users/monica/Documents/Bio_Index/species_data')
```

One important input file is this sort of reference list, which contains some information about the indicator species. This too could potentially be updated in future. Let's take a peek:

```{r}
ind_spp <- read_csv('indicator_species_fn.csv')
head(ind_spp)
# How many unique values are there for each variable?
sapply(ind_spp, function(x) length(unique(x)))
# How many species are in each indicator group?
spp_per_group <- ind_spp %>%
  group_by(indicator_group) %>%
  summarise(species = n())
print(spp_per_group)
```

The other inputs are some shapefiles for LA City. I'll also assign their coordinate system (WGS 84) to a variable for later use:

```{r}
LAcity_boundary <- readOGR('la_shp', 'la_city_griddissolve_wgs84')
cv_patch <- readOGR('la_shp', 'calveg_patch_wgs84')
ctg_patch <- readOGR('la_shp', 'contig_patch_wgs84')
urbn_grid <- readOGR('la_shp', 'urbn_grid_wgs84')

wgs84crs <- crs(LAcity_boundary)
```

We're all set up! And now that we have the shapefiles, I can show you what the extent of our study area, and the habitat "patches", look like:

```{r}
tm_shape(LAcity_boundary, projection = 32311) +
  tm_fill(col = "#f0f0e4", title = "Urban area") +
tm_shape(cv_patch) +
  tm_fill(col = "SI_CLASS",
          palette = c("#f8da19","#ffc3af","#8bc34a","#3a554b","#970707","#de3c3c","#68bde1"),
          title = "Singapore Index Class") +
  tm_layout(main.title = "City of Los Angeles",
            main.title.size = 0.95,
            frame = FALSE,
            inner.margins = 0.02,
            legend.text.size = 0.55,
            legend.title.size = 0.9,
            legend.width = 1.5,
            legend.outside = TRUE)
```


<hr>
<b>Part One</b>

There are three main stages in Part One:
<ul>
<li><b>Steps 1-5,</b> in which we will clean, filter, count, and export the records in the iNaturalist .csv's that we (aka someone in the Grand Challenges office) have downloaded</li>
<li><b> Steps 6-8,</b> in which we subset the species records using patch and urban grid polygons, count up the number of occupied patches/grids, and write those counts to a table</li>
<li><b>Step 9,</b> in which we finally make some maps!</li>
</ul>

I'll start by listing the species in each indicator group (these groups are used to organize the downloaded iNaturalist .csv's and output files). We'll also add new columns to the ind_spp dataframe, which will eventually be populated with the number of records in each shapefile we export.

```{r}
ind_spp_groups <- list.dirs('iNat_downloads', full.names = FALSE, recursive = FALSE)
ind_spp$in_city <- 0
ind_spp$in_patches <- 0
ind_spp$in_urban <- 0
ind_spp$cvpatch_count <- 0
ind_spp$ctgpatch_count <- 0
ind_spp$urbn_count <- 0
```

The main component here is the big for loop, which I will not run in this R Notebook because 1) I have already exported all the files and 2) it really wouldn't show you a very interesting output. But I've detailed each step of the loop in the comments. Loops are not as scary as I once told myself they were!

```{r}
# Loop through the iNat downloads, one indicator_group at a time
for(ind_group in ind_spp_groups) {
  
  # List the species in each indicator group (using the iNaturalist downloads)
  iNat_csv_list <- list.files(paste0('iNat_downloads/',ind_group, collapse=""))
  
  for(species in iNat_csv_list) {
    
    # Read in each species csv; list the variables we will keep upon export
    iNat_pts <- read_csv(paste0('iNat_downloads/',ind_group,'/',species, collapse=""))
    species_nm <- gsub(".csv", "", species)
    keep_cols <- c('scientific_name', 'common_name', 'longitude', 'latitude', 'positional_accuracy', 'observed_on', 'user_id', 'user_login')
    
    # Find the row in ind_spp that lists the current species' details
    refrow <- ind_spp %>%
      filter(csv_fn == species_nm)
    
    # Using the reference row, determine whether we want to filter for positional accuracy AND a specific species, or for positional accuracy alone.
    if(refrow$multiple_spp == FALSE) {
      new_csv <- iNat_pts %>%
        filter((positional_accuracy <= 10) & (positional_accuracy >= 0)) %>%
        filter(scientific_name == refrow$scientific_name) %>%
        dplyr::select(keep_cols)
    } else {
      new_csv <- iNat_pts %>%
        filter((positional_accuracy <= 10) & (positional_accuracy >= 0)) %>%
        dplyr::select(keep_cols)
    }
      
    # Does the species have any accurate records to export?
    if((nrow(new_csv)) > 0) {

      # If yes, write new csv...
      write_csv(new_csv, path = paste0('iNat_cleaned_csv/',ind_group,'/',species, collapse=""))
      
      # Create a new folder for the shapefile, if it does not already exist (we will later create additional shapefiles for each species, so want individual folders)
      dir.create(file.path(paste0('species_shp/',ind_group,'/',species_nm)), showWarnings = FALSE)
      
      # Create shapefile from csv - unfortunately, the shapefile() function abbreviates the field names (7 characters max?!?!!? oh well)
      sp_xy <- cbind(new_csv$longitude, new_csv$latitude)
      sp_pts_all <- SpatialPointsDataFrame(sp_xy,
                                           new_csv,
                                           proj4string = wgs84crs)
      shapefile(sp_pts_all, paste0('species_shp/',ind_group,'/',species_nm,'/',species, collapse=""), overwrite = TRUE)
      rm(sp_xy)
      
      # Clip spatial object to LA City boundary and print number of records that fall within LA city, the patches, and the urban area to original csv.
      # NOTE ON (gIntersection): after much moanin' and groanin', I found that you can SUBSET SPATIALPOINTSDATAFRAMES. NO NEED FOR GINTERSECTS, FOR OVERLAYS, FOR NOTHIN.
      sp_pts_LA <- sp_pts_all[LAcity_boundary, ]
      ind_spp$in_city <- ifelse(ind_spp$csv_fn == species_nm,
                                length(sp_pts_LA),
                                ind_spp$in_city)
      sp_pts_patch <- sp_pts_all[ctg_patch, ]
      ind_spp$in_patches <- ifelse(ind_spp$csv_fn == species_nm,
                                   length(sp_pts_patch),
                                   ind_spp$in_patches)
      sp_pts_urbn <- sp_pts_all[urbn_grid, ]
      ind_spp$in_urban <- ifelse(ind_spp$csv_fn == species_nm,
                                   length(sp_pts_urbn),
                                   ind_spp$in_urban)
    } else {next}
    
# Steps 6-8 = Subset species records using patch and urban grid polygons, write counts to table
    
    if(length(sp_pts_LA) > 0) {
      
      if((length(sp_pts_patch) > 0) & (length(sp_pts_urbn) > 0)) {
        ctg_sub <- ctg_patch[sp_pts_LA, ]
        shapefile(ctg_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_ctg', collapse=""), overwrite = TRUE)
        ind_spp$ctgpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
                                         length(ctg_sub),
                                         ind_spp$ctgpatch_count)
        cv_sub <- cv_patch[sp_pts_LA, ]
        shapefile(cv_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_cv', collapse=""), overwrite = TRUE)
        ind_spp$cvpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
                                        length(cv_sub),
                                        ind_spp$cvpatch_count)
        
        urbn_sub <- urbn_grid[sp_pts_LA, ]
        shapefile(urbn_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_urbn', collapse=""), overwrite = TRUE)
        ind_spp$urbn_count <- ifelse(ind_spp$csv_fn == species_nm,
                                     length(urbn_sub),
                                     ind_spp$urbn_count)
        
      } else if ((length(sp_pts_patch) > 0) & (length(sp_pts_urbn) == 0)) {
        ctg_sub <- ctg_patch[sp_pts_LA, ]
        shapefile(ctg_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_ctg', collapse=""), overwrite = TRUE)
        ind_spp$ctgpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
                                         length(ctg_sub),
                                         ind_spp$ctgpatch_count)
        cv_sub <- cv_patch[sp_pts_LA, ]
        shapefile(cv_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_cv', collapse=""), overwrite = TRUE)
        ind_spp$cvpatch_count <- ifelse(ind_spp$csv_fn == species_nm,
                                        length(cv_sub),
                                        ind_spp$cvpatch_count)
      } else {
        urbn_sub <- urbn_grid[sp_pts_LA, ]
        shapefile(urbn_sub, paste0('species_shp/',ind_group,'/',species_nm,'/',species_nm,'_urbn', collapse=""), overwrite = TRUE)
        ind_spp$urbn_count <- ifelse(ind_spp$csv_fn == species_nm,
                                     length(urbn_sub),
                                     ind_spp$urbn_count)
      }
      
# Step 9 = Map 1) species point records and 2) presence across patches/grids
      
      # Compose map title
      common <- refrow$common_name
      scientific <- refrow$scientific_name
      map_title <- paste0(common, "\n(",scientific, ")") #After much quality Google time, it seems that italicizing part of a text element in tmap can't be done. Normally, one would wrap the to-be-italicized text so: expression(italic(sci_name))), but tmap does not accept expressions.
      
      # Create point records map
      point_map <- tm_shape(LAcity_boundary, projection = 32311) +
        tm_fill(col = "#f0f0e4") +
        tm_shape(ctg_patch) +
        tm_fill(col = "#D8D8CD") +
        tm_shape(sp_pts_LA) +
        tm_symbols(col = "#00DFB3", border.col = "#ffffff", border.lwd = 0.3, scale = 0.3, shape = 21) +
        tm_layout(main.title = map_title,
                  title = paste("iNaturalist Records in\nCity of LA =", length(sp_pts_LA)),
                  title.size = 0.9,
                  title.position = c("left", "bottom"),
                  frame = FALSE,
                  inner.margins = 0.06)
      
      # For presence map, choose between species contiguous and Calveg patch files, depending on taxa
      if(refrow$taxa == "Mammal") {
        poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
          tm_fill(col = "#f0f0e4") +
          tm_shape(ctg_patch) +
          tm_fill(col = "#D8D8CD") +
          tm_shape(ctg_sub) +
          tm_fill(col = "#00DFB3") +
          tm_shape(urbn_sub) +
          tm_fill(col = "#00DFB3") +
          tm_layout(main.title = " ",
                    title = "Presence by Contiguous Patch/\nUrban Grid (0.25 mi)",
                    title.size = 0.9,
                    title.position = c("left", "bottom"),
                    frame = FALSE,
                    inner.margins = 0.06)
      } else {
        poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
          tm_fill(col = "#f0f0e4") +
          tm_shape(ctg_patch) +
          tm_fill(col = "#D8D8CD") +
          tm_shape(cv_sub) +
          tm_fill(col = "#00DFB3") +
          tm_shape(urbn_sub) +
          tm_fill(col = "#00DFB3") +
          tm_layout(main.title = " \n ",
                    title = "Presence by CalVeg Patch/\nUrban Grid (0.25 mi)",
                    title.size = 0.9,
                    title.position = c("left", "bottom"),
                    frame = FALSE,
                    inner.margins = 0.06)
      }
      
      # Export a PDF with the point and polygon maps
      pdf(paste0('pdf_previews/',ind_group,'/',species_nm,'.pdf'), paper = "USr", width = 10.3, height = 7.8)
      print(tmap_arrange(point_map, poly_map))
      dev.off()

    } else {next}
    
    # Write updated ind_spp to a new csv
    write_csv(ind_spp, path = 'indicator_species_fn_updated.csv')
    
  }
}
```

<hr>
<b>Part One Results</b>

Since I'm not running that main chunk of Part One, I'll demonstrate what the PDF output should look like with one of my favorite species, the red-tailed hawk. Some of the objects have been re-specified, as all of the shapefiles are already ready to go. Here's a shot of them hanging out in their folder:

```{r}
library(knitr)
include_graphics('hawk_preview.png')
```


```{r}
redtail_pts <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis')
ctg_sub <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis_ctg')
cv_sub <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis_cv')
urbn_sub <- readOGR('species_shp/common_native_fauna/Buteo_jamaicensis', 'Buteo_jamaicensis_urbn')
sp_pts_LA <- redtail_pts[LAcity_boundary, ]

refrow <- ind_spp %>%
  filter(csv_fn == "Buteo_jamaicensis")

common <- refrow$common_name
scientific <- refrow$scientific_name
map_title <- paste0(common, "\n(",scientific, ")")
      
# Create point records map
point_map <- tm_shape(LAcity_boundary, projection = 32311) +
  tm_fill(col = "#f0f0e4") +
  tm_shape(ctg_patch) +
  tm_fill(col = "#D8D8CD") +
  tm_shape(sp_pts_LA) +
  tm_symbols(col = "#00DFB3", border.col = "#ffffff", border.lwd = 0.3, scale = 0.3, shape = 21) +
  tm_layout(main.title = map_title,
            title = paste("iNaturalist Records in\nCity of LA =", length(sp_pts_LA)),
            title.size = 0.9,
            title.position = c("left", "bottom"),
            frame = FALSE,
            inner.margins = 0.06)
      
# For presence map, choose between species contiguous and Calveg patch files, depending on taxonomic group
if(refrow$taxa == "Mammal") {
  poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
    tm_fill(col = "#f0f0e4") +
    tm_shape(ctg_patch) +
    tm_fill(col = "#D8D8CD") +
    tm_shape(ctg_sub) +
    tm_fill(col = "#00DFB3") +
    tm_shape(urbn_sub) +
    tm_fill(col = "#00DFB3") +
    tm_layout(main.title = " ",
              title = "Presence by Contiguous Patch/\nUrban Grid (0.25 mi)",
              title.size = 0.9,
              title.position = c("left", "bottom"),
              frame = FALSE,
              inner.margins = 0.06)
  } else {
    poly_map <- tm_shape(LAcity_boundary, projection = 32311) +
      tm_fill(col = "#f0f0e4") +
      tm_shape(ctg_patch) +
      tm_fill(col = "#D8D8CD") +
      tm_shape(cv_sub) +
      tm_fill(col = "#00DFB3") +
      tm_shape(urbn_sub) +
      tm_fill(col = "#00DFB3") +
      tm_layout(main.title = " \n ",
                title = "Presence by CalVeg Patch/\nUrban Grid (0.25 mi)",
                title.size = 0.9,
                title.position = c("left", "bottom"),
                frame = FALSE,
                inner.margins = 0.06)
    }
      
tmap_arrange(point_map, poly_map)

```

And as proof that the whole process worked - here are the common native fauna PDFs, all happily nestled away in the appropriate directory:

```{r}
include_graphics('pdf_preview.png')
```


Let's also take a look at some of the data that we added to the ind_spp dataframe using ggplot. A preview of those new columns:

```{r}
ind_spp_2 <- read_csv('indicator_species_fn_updated.csv')
head(ind_spp_2)
```

Which indicator species groups do we have the most data for?

```{r}
available <- ind_spp_2 %>%
  mutate(records = ifelse(in_city > 0, "Yes", "No"))

ggplot(available, aes(x = indicator_group, fill = records)) +
  geom_bar(position = "fill", width = 0.27) +
  geom_text(stat='count', aes(label=..count..), size = 3.5, position = position_fill(vjust = 0.5)) +
  scale_fill_discrete(name = "Records available?") +
  coord_flip() +
  scale_x_discrete(labels = c("Common\nnative fauna", "Common\nnative flora", "Common\nurban sp.", "Invasive", "Rare native")) +
  labs(title = "iNaturalist Records in City of LA", subtitle = "Availability by species; labels indicate number of species", x = "", y = "% of species") +
  theme_minimal() +
  theme(plot.title = element_text(vjust = 2, face = "bold"),
        plot.subtitle = element_text(color = "gray60"),
        axis.title.x = element_text(vjust = -1, color = "gray60"))
```

Availability is lower for rare natives and invasive species, which is not unexpected (some of the invasive species are relatively newer to the region or harder to observe). Which taxa are most represented in the zero records group? We'll lump common native flora and fauna together here for a little more visual clarity.

```{r}
no_la_records <- ind_spp_2 %>%
  filter(in_city == 0) %>%
  mutate(group_2 = indicator_group) %>%
  mutate(group_2 = sub("_fauna", "_ff", group_2)) %>%
  mutate(group_2 = sub("_flora", "_ff", group_2))

ggplot(no_la_records, aes(x = taxa, fill = group_2)) +
  geom_bar(position = position_dodge2(preserve = "single")) +
  coord_flip() +
  scale_y_continuous(breaks = seq(0, 10, by = 2)) +
  scale_fill_manual(values = c("#53d397","#cce490", "#ff5f5f","#fbd341"),
                    labels = c("Common native flora/fauna", "Common urban sp.", "Invasive", "Rare native"),
                    name = "Indicator\nspecies group") +
  labs(title = "Which indicator species are missing iNaturalist data in the City of LA?",
       subtitle = "By taxonomic group",
       x = "", y = "No. of species") +
  theme_minimal() +
  theme(plot.title = element_text(vjust = 2, face = "bold"),
        plot.subtitle = element_text(color = "gray60"),
        axis.title.x = element_text(vjust = -1, color = "gray60"))
```

Invertebrates are the toughest species to get records for (1 out of 1 algae species also has no records) - surprise! People would much rather chronicle their encounters with bobcats than with crickets.

Of the species we do have records for, how many are being observed in the urban areas vs. habitat patches?

```{r}
la_records <- available %>% 
  filter(records == "Yes") %>%
  group_by(taxa) %>%
  summarise(in_patches = sum(in_patches, na.rm = TRUE),
            in_urban = sum(in_urban, na.rm = TRUE)) %>%
  gather(key = "location", value = "records", 2:3)

  
ggplot(la_records, aes(x = taxa, y = records, fill = location)) +
  geom_col(position = "fill", width = 0.3) +
  geom_text(aes(label = records), size = 3.5, position = position_fill(vjust = 0.5)) +
  scale_fill_manual(name = "",
                    labels = c("Habitat patches", "Urban areas"),
                    values = c("#7dc383", "#D8D8CD")) +
  coord_flip() +
  labs(title = "Where are species spotted in the City of LA?", subtitle = "Labels indicate total iNaturalist records per taxonomic group", x = "", y = "% of species") +
  theme_minimal() +
  theme(plot.title = element_text(vjust = 2, face = "bold"),
        plot.subtitle = element_text(color = "gray60"),
        axis.title.x = element_text(vjust = -1, color = "gray60"))

```

Interesting! Observers spot more mammals, invertebrates, and birds in urban areas than they do in habitat patches. There are a few reasonable explanations for this, one obviously being that 12% of the indicator species are in the "common native in urban area" group. Also, some mammals, like the invasive fox squirrel, truly are more common in urban areas. But for other species, this difference between urban and patch observations could be a reflection of sampling bias. Meanwhile, amphibians were more likely to be observed in habitat patches - probably something to do with their very specific habitat needs (I believe all of LA's riparian resources are all classified as habitat patches). The low proportion of plants recorded in urban areas could be due to the dominance of non-native ornamentals in managed landscapes, or our exclusion of records of "captive" species, or both.


<hr>
<b>What's Next</b>

Part two is next! Once we have some scoring methods, I have a pretty good idea of how I might put together the script. The analysis itself probably won't be so complex, but like the procedures in Part One, it will likely be tested and changed multiple times as the methods are developed. Writing the script so far has been very educational, and I think it will help the rest of the process go much smoother! Conversely, the more I work with the loops, the more I understand all the little places where the code can be pruned and streamlined. So I hope that working through the next part of this project will enable me to go back and improve the first half.

There are things to improve on the maps themselves as well. Certain issues that feel relatively small, but still escape me. These include italicizing just part of a title in a tmap object, and customizing a pdf layout that includes tmaps (e.g. adding a title and some text). I would really like to make the scored maps interactive - though I'm not totally sure how I would share them if I continue to use tmap (maybe just the HTML page?). That brings me to the next item on my wish list: expanding my mapping-in-R skillset beyond tmap. Thus far, it's been my favorite because it so easily handles shapefiles. ggplot seems not so straightforward when it comes to polygons, and gganimate is intimidating. But tmap is not without its limitations, so I'm excited to try and wrangle the other methods! 


<hr>
<i>All species data from iNaturalist.</i> http://www.inaturalist.org
