There are dozens of spatial analyses you can perform, but let’s start with some basic ones. You’ll continue working with the cleaned bat occurrence data for the mainland U.S. One of the most common and straightforward analyses is to count the number of species and total occurrences within each state.
First, load all the necessary libraries and import the data. Since the occurrence points were saved as an RDS file, you’ll use the function readRDS() to load them.
library(sf)
library(raster)
library(dplyr)
library(ggplot2)
library(tidyr)
US_border <- read_sf("../shp/us-state-boundaries/us-state-boundaries.shp")
occ <- readRDS("../GBIF/occ_final.rds")
Now, you’ll crop the U.S. polygon using a bounding box (bbox) to retain only the mainland states. Then, you’ll transform the occurrences into a spatial object.
US_Crop <- st_bbox(c(xmin = -125, ymin = 22,
xmax = -66, ymax = 49),
crs = 4326)
US_border <- st_crop(US_border, US_Crop)
occ_sf <- st_as_sf(occ, coords = c("X", "Y"), crs = 4326)
occ_sf <- st_transform(occ_sf, st_crs(US_border)) # same projection with the polygon
Now, let’s perform the count. There are several ways to do this, but in this case, you’ll use a pipeline that simplifies the process.
This line of code creates a summary table that shows how many bat occurrences and how many different species were recorded in each U.S. state. It starts by using st_join() to combine the occurrence points (occ_sf) with the state polygons (US_border), so each point inherits information about the state it falls within. Then, st_drop_geometry() removes the spatial (coordinate) information, since it’s not needed for the summary. The data is grouped by state and name, which allows us to count separately for each state. The summarise() function then calculates two things: the total number of records (n_occurrences = n()) and the number of unique species in each state (n_species = n_distinct(species)). Finally, ungroup() is used to remove the grouping structure, so the resulting data frame can be used more flexibly afterward.
state_counts <- occ_sf %>%
st_join(US_border) %>% #joint both points and polygons
st_drop_geometry() %>% # remove geometry (coordinates)
group_by(state, name) %>% # group all the information by state and name
summarise(
n_occurrences = n(), #count the number of records
n_species = n_distinct(species) # count unique species names
) %>%
ungroup()
state_counts
## # A tibble: 50 × 4
## state name n_occurrences n_species
## <chr> <chr> <int> <int>
## 1 01 Alabama 842 12
## 2 04 Arizona 5553 35
## 3 05 Arkansas 1132 14
## 4 06 California 15202 37
## 5 08 Colorado 3454 22
## 6 09 Connecticut 112 8
## 7 10 Delaware 6 3
## 8 11 District of Columbia 1 1
## 9 12 Florida 1335 16
## 10 13 Georgia 250 12
## # ℹ 40 more rows
This lines creates a new spatial object called US_bats, which contains the map of U.S. states along with the bat occurrence and species count data. It starts with US_border, which is the polygon layer of U.S. states, and uses left_join() to attach the summary data from state_counts by matching the state and name columns. This means that each state in the map will now include information about how many occurrences and species were found there. However, not all states may have bat data, so replace_na() is used to fill in zeros (0) for any missing values in the columns n_occurrences and n_species, instead of leaving them as NA (not available).
US_bats <- US_border %>%
left_join(state_counts, by = c("state", "name")) %>%
replace_na(list(n_occurrences = 0, n_species = 0))
US_bats$n_occurrences
US_bats$n_species
Now, you have a well-structured spatial object with biodiversity information (number of species and occurrences). So, let’s create a nice plot using this new data.
ggplot()+
geom_sf(data = US_bats, aes(fill= n_species))+
scale_fill_continuous(name="N° of Species")+
theme_minimal()+
theme(legend.position = "bottom")+
labs(title = "Bat species by state",
subtitle = "Data from GBIF")
ggplot()+
geom_sf(data = US_bats, aes(fill= n_occurrences))+
scale_fill_continuous(name="N° of Occurrences")+
theme_minimal()+
theme(legend.position = "bottom")+
labs(title = "Bat occurrences by state",
subtitle = "Data from GBIF")
Finally, you can save this new spatial object to use it in other software like QGIS. You can also convert all the information into a regular table and save it as a CSV file.
st_write(US_bats, "../shp/US_bats.gpkg")
US_bata_data <- data.frame(State= US_bats$name,
Species = US_bats$n_species,
Occurrences = US_bats$n_occurrences)
head(US_bata_data)
write.csv(US_bata_data, "../GBIF/US_bats_data.csv", quote = F, row.names = F)