Evaluating species distribution poses significant challenges due to the biogeographical complexity spanning diverse habitats. Key obstacles include accessibility to remote regions, the need for substantial resources, political issues, variations in research infrastructure, cryptic or rare species, and temporal and climatic changes. Geographical variability and ecosystem diversity necessitate collaborative approaches and innovative technologies to obtain comprehensive and representative data on a global scale.
In this context, databases storing species occurrence records, and in some cases, abundance and biomass data, provide a broader perspective, enhancing information gathered through specific research surveys. These databases are crucial for identifying areas of high diversity and establishing conservation strategies for threatened species. Additionally, they enable the assessment of evolutionary biology through biogeographic patterns, planning of agricultural and fishing activities, urban planning, and promotion of environmental education.
In the field of statistical programming with R, we have packages such as dismo, rgbif, and ntbox that enable access to valuable information from the Global Biodiversity Information Facility (GBIF). GBIF collaborates with partners offering online access to other databases like the USGS Bison database, iDigBio, and iNaturalist which records similar information. Specific resources, such as eBird, ALA, VertNet, and OBIS, can be accessed directly using the spocc package. This diversity of sources enriches data access and expands research and conservation possibilities, providing a comprehensive perspective on biodiversity distribution. This benefits both the scientific community and informed decision-making in environmental and sustainable development issues.
It is crucial to note that the information stored in these databases may contain errors, emphasizing the necessity to review the records using the appropriate management of these platforms. This involves eliminating data lacking longitude and latitude information, as well as suspicious records that do not include individual organism counts, registration codes, or the name of the database. Additionally, it is recommended to incorporate information available in grey literature, such as reports and theses, directly into the selected database for the work. Furthermore, reviewing information provided in publications focused on the species of interest, as well as the communities and environments where they have been recorded or could be found, is advised. Lastly, it is important that all biodiversity assessments include, as part of their objectives, the proper recording of species capture information in the database, aligning with the intrinsic characteristics of the species or communities being evaluated.
The following lines of R code will explore various options to download and plot this data using basic R functions, as well as ggplot2, maps, and raster packages. First, the approach that has GBIF as its main database is presented:
g1 = gbif('Callinectes','similis', geo = TRUE, ntries = 1, nrecs = 300)
## 59349 records found
## 0-300-600-900-1200-1500-1800-2100-2400-2700-3000-3300-3600-3900-4200-4500-4800-5100-5400-5700-6000-6300-6600-6900-7200-7500-7800-8100-8400-8700-9000-9300-9600-9900-10200-10500-10800-11100-11400-11700-12000-12300-12600-12900-13200-13500-13800-14100-14400-14700-15000-15300-15600-15900-16200-16500-16800-17100-17400-17700-18000-18300-18600-18900-19200-19500-19800-20100-20400-20700-21000-21300-21600-21900-22200-22500-22800-23100-23400-23700-24000-24300-24600-24900-25200-25500-25800-26100-26400-26700-27000-27300-27600-27900-28200-28500-28800-29100-29400-29700-30000-30300-30600-30900-31200-31500-31800-32100-32400-32700-33000-33300-33600-33900-34200-34500-34800-35100-35400-35700-36000-36300-36600-36900-37200-37500-37800-38100-38400-38700-39000-39300-39600-39900-40200-40500-40800-41100-41400-41700-42000-42300-42600-42900-43200-43500-43800-44100-44400-44700-45000-45300-45600-45900-46200-46500-46800-47100-47400-47700-48000-48300-48600-48900-49200-49500-49800-50100-50400-50700-51000-51300-51600-51900-52200-52500-52800-53100-53400-53700-54000-54300-54600-54900-55200-55500-55800-56100-56400-56700-57000-57300-57600-57900-58200-58500-58800-59100-59349 records downloaded
dim(g1)
## [1] 59349 157
head(g1[,1:5])
## acceptedNameUsage acceptedScientificName acceptedTaxonKey
## 1 <NA> Callinectes similis Williams, 1966 2225642
## 2 <NA> Callinectes similis Williams, 1966 2225642
## 3 <NA> Callinectes similis Williams, 1966 2225642
## 4 <NA> Callinectes similis Williams, 1966 2225642
## 5 <NA> Callinectes similis Williams, 1966 2225642
## 6 <NA> Callinectes similis Williams, 1966 2225642
## accessRights adm1
## 1 <NA> Georgia
## 2 <NA> Alabama
## 3 <NA> Alabama
## 4 <NA> Florida
## 5 <NA> South Carolina
## 6 <NA> North Carolina
Checking the dimensions of the data frame, downloading it later, and sorting it while removing NA values in the essential categories reveal differences between the total number of records and the number of reliable records needed to achieve this goal.
dim(g1)
## [1] 59349 157
g11 = g1[!is.na(g1$individualCount) & !is.na(g1$lon) & !is.na(g1$lat) & !is.na(g1$datasetName),]
dim(g11)
## [1] 742 157
plot(g11$lon, g11$lat, col = "red", pch = 19, xlab = "Longitude", ylab = "Latitude", xlim = c(-100, -65), ylim = c(17, 40),
main = expression(paste("Distribution data for ", italic("Callinectes similis"))))
legend("topright", legend = expression(italic("C. similis")),
col = c("red"), pch = 19, box.lwd = 1)
map(add=T)
Figure 1. Distribution data for Callinectes similis available from the GBIF database
map_data <- map_data("world")
MapG1 = ggplot(g11, aes(x = lon, y = lat)) +
geom_point(aes(color = individualCount), size = 2) +
#geom_text(aes(label = seq), nudge_x = 0.5, nudge_y = 0.1) +
coord_cartesian(ylim=c(17,45),xlim =c(-100,-65)) +
labs( x = "Longitude", y = "Latitude", color = "Count") +
theme_classic() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 1)) +
theme(text= element_text(size=12, family = "Arial", colour = "black"),
axis.text.x = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain", colour = "black", size = 17),
axis.text.y = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17),
axis.title.x = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17),
axis.title.y = element_text(angle = 90,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17))+
borders("world", colour = "black", size= 1) +
ggtitle(expression(paste("Distribution data for ", italic("Callinectes similis")))) +
theme(plot.title = element_text(hjust = 0.5))
MapG1 +
geom_polygon(data = map_data, aes(x = long, y = lat, group = group), fill = "grey", color = "black", alpha = 0.5)
Figure 2. Distribution data for Callinectes similis available from the GBIF database
This package provides options to retrieve data based on depth, elevation, and geographic area among others. In this context, these options are evaluated using the depth distribution of Achelous spinicarpus Stimpson, 1871 (0-521 meters) and Achelous spinimanus Latreille, 1819 (0-393 meters).
g21 = occ_data(scientificName = "Achelous spinicarpus", hasCoordinate = TRUE, limit = 30000)
g21 = data.frame(g21$data)
head(g21[,1:5])
## key scientificName decimalLatitude
## 1 4430762861 Achelous spinicarpus Stimpson, 1871 26.920171
## 2 3743232619 Achelous spinicarpus Stimpson, 1871 26.939775
## 3 3947829825 Achelous spinicarpus Stimpson, 1871 28.481204
## 4 3913938976 Achelous spinicarpus Stimpson, 1871 7.128995
## 5 3913938988 Achelous spinicarpus Stimpson, 1871 7.071733
## 6 3913939118 Achelous spinicarpus Stimpson, 1871 7.183760
## decimalLongitude issues
## 1 -96.53049 cdround,cudc
## 2 -84.08257 cdround,cudc
## 3 -96.42286 cdc,cdround
## 4 -55.93917 cdround,depmms,depunl,osiic
## 5 -56.12123 cdround,depmms,depunl,osiic
## 6 -56.24220 cdround,depmms,depunl,osiic
dim(g21)
## [1] 1428 158
g211 = g21[!is.na(g21$individualCount) & !is.na(g21$decimalLongitude) & !is.na(g21$decimalLatitude) & !is.na(g21$datasetName),]
dim(g211)
## [1] 515 158
g22 = occ_data(scientificName = "Achelous spinimanus", hasCoordinate = TRUE, limit = 30000)
g22 = data.frame(g22$data)
dim(g22)
## [1] 1140 167
g221 = g22[!is.na(g22$individualCount) & !is.na(g22$decimalLongitude) & !is.na(g22$decimalLatitude) & !is.na(g22$datasetName),]
dim(g221)
## [1] 418 167
Here, we can examine the depth range of the registered data for A. spinicarpus (values in the first vector) and compare it to the recorded data for A. spinimanus (values in the second vector).
sort(unique(g211$depth))
## [1] 0.00 0.05 2.50 3.00 3.75 5.25 6.00 14.20 15.00
## [10] 16.00 18.00 19.60 22.50 23.00 24.00 24.72 25.00 25.10
## [19] 26.50 26.75 27.00 27.75 28.00 30.40 31.50 32.00 33.00
## [28] 33.50 35.00 37.00 37.50 38.00 38.40 38.50 39.00 40.00
## [37] 41.00 41.50 42.00 43.00 44.00 45.00 45.70 46.00 47.00
## [46] 48.00 48.50 49.00 49.40 50.00 50.20 51.00 52.00 53.00
## [55] 53.50 54.00 55.00 55.50 56.00 56.70 57.00 57.50 58.00
## [64] 58.50 58.60 59.00 59.50 60.00 61.00 62.20 63.00 64.00
## [73] 65.00 65.80 66.00 67.00 68.00 69.00 69.50 70.00 71.00
## [82] 71.30 72.00 73.00 75.00 76.20 77.00 78.00 79.00 80.50
## [91] 82.00 85.00 86.00 88.00 89.50 90.00 91.00 91.50 92.00
## [100] 94.00 95.00 96.00 97.00 100.00 100.50 101.00 102.00 105.00
## [109] 106.00 108.00 109.50 110.00 113.00 116.00 116.50 119.00 120.00
## [118] 121.00 123.00 123.50 124.00 125.00 125.50 126.50 127.00 128.00
## [127] 137.00 139.00 146.00 152.50 155.00 159.00 161.00 164.50 165.00
## [136] 174.00 177.00 183.00 185.00 201.00 205.50 227.00 228.50 229.00
## [145] 233.50 237.50 238.00 259.00 260.50 270.00 283.00 290.00 301.50
## [154] 320.00 329.00 334.00 357.00 366.00 379.00 384.00 401.50 411.00
## [163] 412.00 415.00 421.00 425.00 457.00 457.50 549.00 3658.00
sort(unique(g221$depth))
## [1] 0.00 0.25 0.50 1.00 2.00 5.00 6.00 6.30 7.00 7.90
## [11] 8.00 8.50 9.00 9.60 10.00 11.00 11.25 12.00 13.00 13.26
## [21] 14.17 15.00 15.20 15.50 16.00 16.50 17.70 17.83 18.00 18.10
## [31] 19.00 19.60 20.00 21.00 22.00 22.50 23.00 24.00 24.50 25.00
## [41] 25.20 26.00 27.00 27.75 29.00 30.00 30.40 31.00 31.50 32.00
## [51] 33.00 34.50 35.00 36.50 37.00 37.50 40.00 42.00 44.00 49.00
## [61] 49.50 50.20 51.00 52.00 53.00 55.00 55.50 56.00 57.00 57.50
## [71] 59.50 60.00 65.00 66.00 68.00 91.00 121.00 152.50 165.00 196.50
## [81] 293.00 320.00 366.00 393.00
Considering the available information, the recorded depth range for those two species is deemed accurate. The subsequent representation of these data utilizes basic functions and the maps package. The following maps present all the data without sorting them according to depth. This approach is justified by the shared habitat of the two species, making it more crucial to visualize the areas where both species are present.
summary(g211[,3:4])
## decimalLatitude decimalLongitude
## Min. :-32.68 Min. :-97.74
## 1st Qu.: 19.00 1st Qu.:-87.62
## Median : 24.75 Median :-82.85
## Mean : 20.63 Mean :-79.54
## 3rd Qu.: 26.76 3rd Qu.:-76.57
## Max. : 35.79 Max. : 0.00
summary(g221[,3:4])
## decimalLatitude decimalLongitude
## Min. :-32.78 Min. :-97.74
## 1st Qu.: 20.95 1st Qu.:-88.04
## Median : 25.88 Median :-82.25
## Mean : 22.63 Mean :-79.63
## 3rd Qu.: 30.23 3rd Qu.:-77.04
## Max. : 40.47 Max. : 0.00
plot(g211$decimalLongitude, g211$decimalLatitude, col = "red", pch = 16, cex = 0.5, xlab = "Longitude", ylab = "Latitude", xlim = c(-100,0), ylim = c(-32, 45), main = expression(paste("Distribution data for ", italic("A. spinicarpus")," (red) and ", italic("A. spinimanus")," (orange)")))
points(g221$decimalLongitude, g221$decimalLatitude, col = "orange", pch = 17, cex = 0.5)
legend("bottomright", legend = expression(italic("A. spinicarpus"), italic("A. spinimanus")),
col = c("red", "orange"), pch = c(16, 17), box.lwd = 1)
map(add=T)
Figure 3. Distribution data for Achelous spinicarpus and Achelous spinimanus available from the GBIF database
Four genera and four species were observed due to the presence of synonymized names. The data for the species Portunus vossi Lemaitre, 1991 represents Achelous spinimanus as it is a synonymized name.
unique(g211$scientificName)
## [1] "Portunus spinicarpus (Stimpson, 1871)"
## [2] "Achelous spinicarpus Stimpson, 1871"
length(g211$scientificName)
## [1] 515
unique(g221$scientificName)
## [1] "Achelous spinimanus (Latreille, 1819)"
## [2] "Portunus spinimanus Latreille, 1819"
## [3] "Portunus vossi Lemaitre, 1991"
length(g221$scientificName)
## [1] 418
g211$scientificName = rep("A. spinicarpus", 515)
unique(g211$scientificName)
## [1] "A. spinicarpus"
g221$scientificName = rep("A. spinimanus", 418)
unique(g221$scientificName)
## [1] "A. spinimanus"
map_data <- map_data("world")
MapG2 = ggplot() +
geom_point(data = g211, aes(x = decimalLongitude, y = decimalLatitude, color = scientificName, size = individualCount)) +
geom_point(data = g221, aes(x = decimalLongitude, y = decimalLatitude, color = scientificName, size = individualCount)) +
#geom_text(aes(label = seq), nudge_x = 0.5, nudge_y = 0.1) +
coord_cartesian(xlim = c(-100,0), ylim = c(-32, 45)) +
labs( x = "Longitude", y = "Latitude", color = c("Species"), size = "Count") +
theme_classic() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 1)) +
theme(text= element_text(size=12, family = "Arial", colour = "black"),
axis.text.x = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain", colour = "black", size = 17),
axis.text.y = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17),
axis.title.x = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17),
axis.title.y = element_text(angle = 90,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17))+
borders("world", colour = "black", size= 1) +
ggtitle(expression(paste("Distribution data for ", italic("A. spinicarpus")," and ",italic("A. spinimanus")))) +
theme(plot.title = element_text(hjust = 0.5)) +
guides(
color = guide_legend(
label.theme = element_text(face = "italic")),
size = guide_legend(
label.theme = element_text(face = "plain")))
MapG2 +
geom_polygon(data = map_data, aes(x = long, y = lat, group = group), fill = "grey", color = "black", alpha = 0.5)
Figure 4. Distribution data for Achelous spinicarpus and Achelous spinimanus available from the GBIF database
This package features a graphical user interface; nevertheless, in this tutorial, we exclusively employ the function search_gbif_data for obtaining distribution data, but leafletplot parameter is used. Additionally, the package offers functions to retrieve environmental variables (e.g., WorldClim).
g3 = search_gbif_data("Phyllospongia","foliascens", occlim = 10000, writeFile = FALSE, leafletplot = FALSE, showClusters = FALSE)
head(g3[,1:5])
## name longitude latitude issues prov
## 1 Phyllospongia foliascens (Pallas, 1766) 125.6085 -8.522772 cdc,cdround gbif
## 2 Phyllospongia foliascens (Pallas, 1766) 177.3317 -16.949136 cdc,cdround gbif
## 3 Phyllospongia foliascens (Pallas, 1766) 177.3323 -16.943435 cdc,cdround gbif
## 4 Phyllospongia foliascens (Pallas, 1766) 120.8912 13.686359 cdc,cdround gbif
## 5 Phyllospongia foliascens (Pallas, 1766) 120.8912 13.686359 cdc,cdround gbif
## 6 Phyllospongia foliascens (Pallas, 1766) 120.8912 13.686359 cdc,cdround gbif
mg3 = leaflet::leaflet(g3)
mg31 <- mg3 %>% leaflet::addTiles()
mg32 = mg31 %>% leaflet::addCircleMarkers(lng = ~ longitude,
lat = ~latitude,
fillOpacity = 0.25,
radius = 2,col="firebrick")
mg32
Figure 5. Distribution data of Phyllospongia foliascens available from the GBIF database
dim(g3)
## [1] 337 154
g31 = g3[!is.na(g3$individualCount) & !is.na(g3$longitude) & !is.na(g3$latitude) & !is.na(g3$datasetName),]
dim(g31)
## [1] 2 154
According to the database, Phyllospongia foliascens Pallas, 1766, and considering the four columns, the majority of records are not reliable.
Similar to rgbif and ntbox, it is possible to select a geographic area to download the data. In this case, it was used a shapefile of the Caribbean Island/Island Group Management Areas Map & GIS Data available through the NOAA FISHERIES website NOAA FISHERIES website. However, at the time of this tutorial, the function was not recognized by R, despite uninstalling and reinstalling the package, which may or may not be due to an issue on my end.
shp = sf::st_read(dsn ="C:/shapefile/CaribbeanManagementAreas_po.shp")
## Reading layer `CaribbeanManagementAreas_po' from data source
## `C:\shapefile\CaribbeanManagementAreas_po.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -68.48189 ymin: 14.92819 xmax: -63.88889 ymax: 21.85425
## Geodetic CRS: NAD83
plot(shp)
Figure 6. Shapefile of Puerto Rico, St. Thomas, St. John, and St. Croix islands
#g4 = obtainGBIF("Dendrogyra cylindrus", shp)
This package offers the option to select the study area and database for obtaining species distribution data. Here, all databases are presented; however, it’s important to note that the focus is on sponge species XD. It’s worth mentioning that eBird and VertNet data may not provide relevant information as they are specialized for vertebrate species.
g4 = occ(query = "Dendrogyra cylindrus", from = c('gbif', 'bison', 'inat', 'ebird', 'ecoengine','vertnet', 'obis'),
limit = 10000, geometry = shp)
g4
## Searched: gbif, bison, inat, ebird, ecoengine, vertnet, obis
## Occurrences - Found: 80, Returned: 80
## Search type: Scientific
## inat: Dendrogyra cylindrus (79)
## vertnet: Dendrogyra cylindrus (1)
##
## Note: spocc cannot estimate complete additional records found as none available for ebird
To obtain all information with this package, it is necessary to use the functions data.frame and occ2df. Additionally, acquiring data for this species from the VertNet database is uncommon. This package provides data with different column names than those used above.
g4 = occ2df(g4)
head(g4)
## # A tibble: 6 × 6
## name longitude latitude prov date key
## <chr> <chr> <chr> <chr> <date> <chr>
## 1 Dendrogyra cylindrus -64.9017938019 18.3617737391 inat 2015-08-01 191491078
## 2 Dendrogyra cylindrus -64.9476047061 18.346176467 inat 2016-02-13 187244096
## 3 Dendrogyra cylindrus -64.8867011932 18.2333751163 inat 2015-12-06 185014583
## 4 Dendrogyra cylindrus -64.8995158985 18.3018340423 inat 2015-12-06 185014063
## 5 Dendrogyra cylindrus -64.9314428336 18.3330347611 inat 2015-11-27 184737334
## 6 Dendrogyra cylindrus -64.8065847244 18.3507824507 inat 2015-10-18 183108013
dim(g4)
## [1] 80 6
g4$longitude = as.numeric(g4$longitude)
g4$latitude = as.numeric(g4$latitude)
g41 = g4[!is.na(g4$longitude) & !is.na(g4$latitude) & !is.na(g4$prov),]
dim(g41)
## [1] 79 6
summary(g41[,2:3])
## longitude latitude
## Min. :-68.38 Min. :17.66
## 1st Qu.:-64.96 1st Qu.:18.24
## Median :-64.88 Median :18.30
## Mean :-65.05 Mean :18.30
## 3rd Qu.:-64.75 3rd Qu.:18.35
## Max. :-64.47 Max. :18.68
plot(g41$longitude, g41$latitude, col = "limegreen", pch = 19, xlab = "Longitude", ylab = "Latitude", xlim = c(-70, -58), ylim = c(19, 15),
main = expression(paste("Distribution data for ", italic("Dendrogyra cylindrus"))))
legend("topright", legend = expression(italic("D. cylindrus")),
col = c("limegreen"), pch = 19, box.lwd = 1)
map(add=T)
map_data <- map_data("world")
MapG3 = ggplot() +
geom_point(data = g41, aes(x = longitude, y = latitude, color = name)) +
coord_cartesian(xlim = c(-70, -58), ylim = c(19, 15)) +
labs( x = "Longitude", y = "Latitude", color = c("Species"), size = "Conteo") +
theme_classic() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 1)) +
theme(text= element_text(size=12, family = "Arial", colour = "black"),
axis.text.x = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain", colour = "black", size = 17),
axis.text.y = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17),
axis.title.x = element_text(angle = 0,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17),
axis.title.y = element_text(angle = 90,hjust = 0.5, vjust = 0.5, face = "plain",colour = "black", size = 17))+
borders("world", colour = "black", size= 1) +
ggtitle(expression(paste("Distribution data for ", italic("Dendrogyra cylindrus")))) +
theme(plot.title = element_text(hjust = 0.5)) + theme(legend.text = element_text(face = "italic"))
MapG3 +
geom_polygon(data = map_data, aes(x = long, y = lat, group = group), fill = "grey", color = "black", alpha = 0.5)
Figure 8. Distribution data for Dendrogyra cylindrus available from the inaturalist and vernet databases
There are various methods available for retrieving biodiversity information in Python, and it’s also possible to accomplish this in R by employing an approach related to web scraping. Notably, in both Python and R, there exist specialized packages to facilitate this process. In the case of Python, there are at least two notable packages designed for this purpose. In this tutorial, we will explore a specific method utilizing a web scraping function along with the functions provided by the pygbif package. pygbif is a valuable tool that interfaces with the Global Biodiversity Information Facility (GBIF) API, offering a seamless and efficient means of obtaining species occurrence data. By incorporating a web scraping function into the workflow, users can further enhance their ability to gather comprehensive biodiversity information for their projects. This tutorial will provide a step-by-step guide on implementing this combined approach for effective data retrieval
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pygbif import occurrences
import geopandas as gpd
import requests
def get_data(taxon_name):
url = f'https://api.inaturalist.org/v1/observations?taxon_name={taxon_name}&quality_grade=research&per_page=100'
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
print(f'Error fetching data: {response.status_code}')
# Data
taxon_name = 'Panulirus argus'
sp_data = get_data(taxon_name)
type(sp_data)
## <class 'dict'>
Converting dictionary to data frame
FuncDat = pd.json_normalize(sp_data['results'])
type(FuncDat)
## <class 'pandas.core.frame.DataFrame'>
FuncDat.head()
## quality_grade ... user.preferences.prefers_project_addition_by
## 0 research ... NaN
## 1 research ... NaN
## 2 research ... NaN
## 3 research ... NaN
## 4 research ... NaN
##
## [5 rows x 136 columns]
Checking columns
NCOL = FuncDat.columns
NCOL1 = list(NCOL)
NCOL1.sort()
print(NCOL1)
## ['annotations', 'cached_votes_total', 'captive', 'comments', 'comments_count', 'community_taxon_id', 'created_at', 'created_at_details.date', 'created_at_details.day', 'created_at_details.hour', 'created_at_details.month', 'created_at_details.week', 'created_at_details.year', 'created_time_zone', 'description', 'faves', 'faves_count', 'flags', 'geojson.coordinates', 'geojson.type', 'geoprivacy', 'id', 'ident_taxon_ids', 'identifications', 'identifications_count', 'identifications_most_agree', 'identifications_most_disagree', 'identifications_some_agree', 'license_code', 'location', 'map_scale', 'mappable', 'non_owner_ids', 'num_identification_agreements', 'num_identification_disagreements', 'oauth_application_id', 'obscured', 'observation_photos', 'observed_on', 'observed_on_details.date', 'observed_on_details.day', 'observed_on_details.hour', 'observed_on_details.month', 'observed_on_details.week', 'observed_on_details.year', 'observed_on_string', 'observed_time_zone', 'ofvs', 'outlinks', 'owners_identification_from_vision', 'photos', 'place_guess', 'place_ids', 'positional_accuracy', 'preferences.prefers_community_taxon', 'project_ids', 'project_ids_with_curator_id', 'project_ids_without_curator_id', 'project_observations', 'public_positional_accuracy', 'quality_grade', 'quality_metrics', 'reviewed_by', 'site_id', 'sounds', 'spam', 'species_guess', 'tags', 'taxon.ancestor_ids', 'taxon.ancestry', 'taxon.atlas_id', 'taxon.complete_species_count', 'taxon.created_at', 'taxon.current_synonymous_taxon_ids', 'taxon.default_photo.attribution', 'taxon.default_photo.flags', 'taxon.default_photo.id', 'taxon.default_photo.license_code', 'taxon.default_photo.medium_url', 'taxon.default_photo.original_dimensions.height', 'taxon.default_photo.original_dimensions.width', 'taxon.default_photo.square_url', 'taxon.default_photo.url', 'taxon.endemic', 'taxon.extinct', 'taxon.flag_counts.resolved', 'taxon.flag_counts.unresolved', 'taxon.iconic_taxon_id', 'taxon.iconic_taxon_name', 'taxon.id', 'taxon.introduced', 'taxon.is_active', 'taxon.min_species_ancestry', 'taxon.min_species_taxon_id', 'taxon.name', 'taxon.native', 'taxon.observations_count', 'taxon.parent_id', 'taxon.photos_locked', 'taxon.preferred_common_name', 'taxon.rank', 'taxon.rank_level', 'taxon.taxon_changes_count', 'taxon.taxon_schemes_count', 'taxon.threatened', 'taxon.universal_search_rank', 'taxon.wikipedia_url', 'taxon_geoprivacy', 'time_observed_at', 'time_zone_offset', 'updated_at', 'uri', 'user.activity_count', 'user.created_at', 'user.icon', 'user.icon_url', 'user.id', 'user.identifications_count', 'user.journal_posts_count', 'user.login', 'user.login_autocomplete', 'user.login_exact', 'user.name', 'user.name_autocomplete', 'user.observations_count', 'user.orcid', 'user.preferences.prefers_observation_fields_by', 'user.preferences.prefers_project_addition_by', 'user.roles', 'user.site_id', 'user.spam', 'user.species_count', 'user.suspended', 'user.universal_search_rank', 'uuid', 'votes']
Obtaining count data from a database
PAcounts = FuncDat['identifications_count']
type(PAcounts)
## <class 'pandas.core.series.Series'>
Obtaining coordinates data from a database
PAcoor = FuncDat['geojson.coordinates']
PAcoor[0:4]
## 0 [-86.742391102, 20.9353787156]
## 1 [-86.8031357601, 20.9219580775]
## 2 [-86.4329452, 16.3593807]
## 3 [-61.7928594014, 12.007440603]
## Name: geojson.coordinates, dtype: object
Extracting longitude data
Longitude = [x[0] for x in PAcoor]
type(Longitude)
## <class 'list'>
Longitude[0:4]
## [-86.742391102, -86.8031357601, -86.4329452, -61.7928594014]
Extracting latitude data
Latitude = [y[1] for y in PAcoor]
type(Latitude)
## <class 'list'>
Latitude[0:4]
## [20.9353787156, 20.9219580775, 16.3593807, 12.007440603]
Preparing to merge data.frame data and list data to create a data frame
PAdat = {'Longitude': Longitude, 'Latitude': Latitude, 'Counts': PAcounts}
Creating data frame
PAdf = pd.DataFrame(PAdat)
Removing NA values from the data frame
PAdf = PAdf.dropna()
PAdf.shape
## (100, 3)
Final data frame
PAdf.head()
## Longitude Latitude Counts
## 0 -86.742391 20.935379 1
## 1 -86.803136 20.921958 1
## 2 -86.432945 16.359381 1
## 3 -61.792859 12.007441 1
## 4 -87.026111 20.343889 2
naturalearth_lowres = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
fig, ax = plt.subplots(figsize=(10, 6))
naturalearth_lowres.plot(ax=ax, color='lightgray', edgecolor='black')
ax.set_xlim([-120, 50])
## (-120.0, 50.0)
ax.set_ylim([-50,60])
## (-50.0, 60.0)
plt.scatter(PAdf['Longitude'], PAdf['Latitude'], color = 'orange', s = 20)
plt.legend(['counts'], loc='lower right', bbox_to_anchor=(1, 0))
ax.set_xlabel('Longitude', color = 'black', size = 11)
ax.set_ylabel('Latitude', color = 'black', size = 11)
ax.tick_params(axis = 'x', colors = 'black', labelsize = 15)
ax.tick_params(axis = 'y', colors = 'black', labelsize = 15)
title = r'$\mathrm{Distribution}$ $\mathrm{for}$ $\it{Panulirus}$ $\it{argus}$'
ax.set_title(title, color = 'black', size = 15)
plt.show()
Figure 9. Distribution data for Panulirus argus available from the inaturalist database
occ_data = occurrences.search(scientificName = 'Periclimenes iridescens')
type(occ_data)
## <class 'dict'>
occ_df = pd.json_normalize(occ_data['results'])
type(occ_df)
## <class 'pandas.core.frame.DataFrame'>
NCOL = occ_df.columns
NCOL2 = list(NCOL)
NCOL2.sort()
print(NCOL2)
## ['acceptedNameUsage', 'acceptedScientificName', 'acceptedTaxonKey', 'accessRights', 'associatedReferences', 'basisOfRecord', 'bibliographicCitation', 'catalogNumber', 'class', 'classKey', 'collectionCode', 'collectionID', 'collectionKey', 'continent', 'coordinateUncertaintyInMeters', 'country', 'countryCode', 'county', 'crawlId', 'dataGeneralizations', 'datasetID', 'datasetKey', 'datasetName', 'dateIdentified', 'day', 'decimalLatitude', 'decimalLongitude', 'depth', 'depthAccuracy', 'disposition', 'distanceFromCentroidInMeters', 'dynamicProperties', 'elevation', 'elevationAccuracy', 'endDayOfYear', 'establishmentMeans', 'eventDate', 'eventID', 'eventRemarks', 'eventTime', 'extensions.http://rs.gbif.org/terms/1.0/Multimedia', 'extensions.http://rs.tdwg.org/ac/terms/Multimedia', 'extensions.http://rs.tdwg.org/dwc/terms/Identification', 'facts', 'family', 'familyKey', 'fieldNumber', 'footprintWKT', 'gadm.level0.gid', 'gadm.level0.name', 'gadm.level1.gid', 'gadm.level1.name', 'gadm.level2.gid', 'gadm.level2.name', 'gadm.level3.gid', 'gadm.level3.name', 'gbifID', 'genericName', 'genus', 'genusKey', 'geodeticDatum', 'georeferenceProtocol', 'georeferenceSources', 'georeferenceVerificationStatus', 'georeferencedBy', 'georeferencedDate', 'habitat', 'higherClassification', 'higherGeography', 'higherGeographyID', 'hostingOrganizationKey', 'http://unknown.org/language', 'http://unknown.org/orders', 'identificationID', 'identificationQualifier', 'identificationRemarks', 'identificationVerificationStatus', 'identifiedBy', 'identifiedByIDs', 'identifier', 'identifiers', 'individualCount', 'installationKey', 'institutionCode', 'institutionID', 'institutionKey', 'isInCluster', 'island', 'islandGroup', 'issues', 'iucnRedListCategory', 'key', 'kingdom', 'kingdomKey', 'language', 'lastCrawled', 'lastInterpreted', 'lastParsed', 'license', 'lifeStage', 'locality', 'locationAccordingTo', 'locationID', 'locationRemarks', 'materialSampleID', 'media', 'modified', 'month', 'municipality', 'nameAccordingTo', 'namePublishedInYear', 'networkKeys', 'nomenclaturalCode', 'occurrenceID', 'occurrenceRemarks', 'occurrenceStatus', 'order', 'orderKey', 'organismID', 'originalNameUsage', 'otherCatalogNumbers', 'ownerInstitutionCode', 'parentNameUsage', 'phylum', 'phylumKey', 'preparations', 'previousIdentifications', 'programmeAcronym', 'projectId', 'protocol', 'publishingCountry', 'publishingOrgKey', 'recordedBy', 'recordedByIDs', 'references', 'relations', 'rightsHolder', 'sampleSizeUnit', 'samplingProtocol', 'scientificName', 'scientificNameID', 'sex', 'species', 'speciesKey', 'specificEpithet', 'startDayOfYear', 'stateProvince', 'taxonID', 'taxonKey', 'taxonRank', 'taxonRemarks', 'taxonomicStatus', 'type', 'typeStatus', 'verbatimCoordinateSystem', 'verbatimDepth', 'verbatimEventDate', 'verbatimLocality', 'verbatimSRS', 'vernacularName', 'waterBody', 'year']
Fcol = ['decimalLongitude','decimalLatitude','occurrenceStatus','datasetName']
type(Fcol)
## <class 'list'>
Focc_df = occ_df[Fcol]
Focc_df.shape
## (224, 4)
Focc_df.head()
## decimalLongitude ... datasetName
## 0 -68.969640 ... NaN
## 1 -89.700444 ... Actualización del conocimiento de la diversida...
## 2 -89.714944 ... Actualización del conocimiento de la diversida...
## 3 -89.714944 ... NaN
## 4 -89.700444 ... NaN
##
## [5 rows x 4 columns]
Focc_df.columns
## Index(['decimalLongitude', 'decimalLatitude', 'occurrenceStatus',
## 'datasetName'],
## dtype='object')
type(Focc_df)
## <class 'pandas.core.frame.DataFrame'>
Focc_df['Values'] = 1
Focc_df['datasetName'].unique()
## array([nan,
## 'Actualización del conocimiento de la diversidad de especies de invertebrados marinos bentónicos de aguas someras (<50m) del Sur del Golfo de México',
## 'NOAA National Benthic Inventory',
## 'Digitalización y Sistematización de las Colecciones Biológicas Nacionales del Instituto de Biología, UNAM',
## 'Colección de Artropodos del Museo de Historia Natural Marina de Colombia - Makuriwa',
## 'CUBAGUA_CRUS',
## 'Crustáceos estomatópodos, anfípodos, isópodos y decápodos del litoral de Quintana Roo',
## 'Computarización de la Colección Nacional de Crustáceos del Instituto de Biología, UNAM y elaboración de su catálogo',
## 'NMNH Extant Biology'], dtype=object)
Focc_df = Focc_df.dropna()
Focc_df.shape
## (80, 5)
Focc_df.head()
## decimalLongitude ... Values
## 1 -89.700444 ... 1
## 2 -89.714944 ... 1
## 12 -89.682272 ... 1
## 14 -89.647583 ... 1
## 15 -90.283028 ... 1
##
## [5 rows x 5 columns]
# Crear la gráfica
naturalearth_lowres = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
fig, ax = plt.subplots(figsize=(10, 6))
naturalearth_lowres.plot(ax=ax, color='lightgray', edgecolor='black')
ax.set_xlim([-120, 50])
## (-120.0, 50.0)
ax.set_ylim([-50,60])
## (-50.0, 60.0)
plt.scatter(Focc_df['decimalLongitude'], Focc_df['decimalLatitude'], color = 'black', s = 20)
plt.legend(['counts'], loc='lower right', bbox_to_anchor=(1, 0))
ax.set_xlabel('Longitude', color = 'black', size = 11)
ax.set_ylabel('Latitude', color = 'black', size = 11)
ax.tick_params(axis = 'x', colors = 'black', labelsize = 15)
ax.tick_params(axis = 'y', colors = 'black', labelsize = 15)
title = r'$\mathrm{Distribution}$ $\mathrm{for}$ $\it{Periclimenes}$ $\it{iridescens}$'
ax.set_title(title, color = 'black', size = 15)
plt.show()
Figure 10. Distribution data for Periclimenes iridescens available from the GBIF database
Anderson, J., Olsen, Z., Glen Sutton, T. W., Gelpi, C., & Topping, D. (2017). Environmental Drivers of the Spatial and Temporal Distribution of Spawning Blue Crabs Callinectes sapidus in the Western Gulf of Mexico. North American Journal of Fisheries Management, 37(4), 920–934. https://doi.org/10.1080/02755947.2017.1335255
Bernardo, C. H., Pereira Bernardes, V., de Sousa, A. N., Barros Rodrigues, G. F., da Silva, T. E., & Fransozo, A. (2019). Environmental factors modulating the bathymetric distribution of the demographic groups of Achelous spinimanus (Crustacea). Neotropical Biology and Conservation, 14(1), 13–28. https://doi.org/10.3897/neotropical.14.e34829
Briones-FourzÁn, P., Candela, J., & Lozano-Álvarez, E. (2008). Postlarval settlement of the spiny lobster Panulirus argus along the Caribbean coast of Mexico: Patterns, influence of physical factors, and possible sources of origin. Limnology and Oceanography, 53(3), 970–985. https://doi.org/10.4319/lo.2008.53.3.0970
Chamberlain, S., Oldoni, D., Barve, V., Desmet, P., Geffert, L., Mcglinn, D., & Ram, K. (2023). Interface to the Global Biodiversity Information Facility API. Package ‘rgbif’ (3.7.8) [Computer software].
Chandler, R. (2019). Modeling and mapping species distributions.
Cox, C., Hunt, J. H., Lyons, W. G., & Davis, G. E. (1997). Nocturnal foraging of the Caribbean spiny lobster (Panulirus argus) on offshore reefs of Florida, USA. Marine and Freshwater Research, 48(8), 671. https://doi.org/10.1071/MF97198
De Grave, S. (2014). A new species of crinoid-associated Periclimenes from Honduras (Crustacea: Decapoda: Palaemonidae). Zootaxa, 3793(5), 587. https://doi.org/10.11646/zootaxa.3793.5.6
Hijmans, R., & Elith, J. (2011). Species distribution modelling with R (p. 87).
Hijmans, R., Phillips, S., Leathwick, J., & Elith, J. (2020). Species Distribution Modeling. Package “dismo”. (1.3-3) [R language].
Hsueh, P.-W., McClintock, J. B., & Hopkins, T. S. (1993). Population Dynamics and Life History Characteristics of the Blue Crabs Callinectes similis and C. sapidus in Bay Environments of the Northern Gulf of Mexico. Marine Ecology, 14(3), 239–257. https://doi.org/10.1111/j.1439-0485.1993.tb00482.x
Jensen, O., Seppelt, R., Miller, T., & Bauer, L. (2005). Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: Application and cross-validation of a two-stage generalized additive model. Marine Ecology Progress Series, 299, 239–255. https://doi.org/10.3354/meps299239
Owens, H., Barve, V., Chamberlain, S., Ram, K., & Hart, T. (2023). Interface to Species Occurrence Data Sources. Package ‘spocc’ (1.2.2) [Computer software].
Pardal-Souza, A. L. (2011). Crescimento relativo e reprodução em Achelous spinicarpus (Crustacea: Portunidae), na Plataforma Continental Sudeste do Brasil [Ungraduate, Universidade Estadual Paulista]. http://rgdoi.net/10.13140/RG.2.2.18550.80964
Pardal-Souza, A. L., & Pinheiro, M. A. A. (2013). Relative growth and reproduction in Achelous spinicarpus (Crustacea: Portunidae) on the south-eastern continental shelf of Brazil. Journal of the Marine Biological Association of the United Kingdom, 93(3), 667–674. https://doi.org/10.1017/S0025315412000318
Parra-Velandia, F. J., Zea, S., & Van Soest, R. W. M. (2014). Reef sponges of the genus Agelas (Porifera: Demospongiae) from the Greater Caribbean. Zootaxa, 3794(3), 301. https://doi.org/10.11646/zootaxa.3794.3.1
Rocha-Ramirez, A., Chazaro-Olvera, S., & Roman-Contreras, R. (2000). Observations on feeding, maturity, and fecundity of Callinectes similis williams, 1966, on the central continental shelf off Veracruz, Gulf of Mexico. Crustaceana, 73(3), 323–332. https://doi.org/10.1163/156854000504417
Román-Contreras, R., & Martínez-Mayén, M. (2009). Reproduction of Periclimenes Siankaanensis (Decapoda, Caridea, Palaemonidae) in Bahía De La Ascensión, Quintana Roo, Mexico. Crustaceana, 82(1), 27–37. https://doi.org/10.1163/156854008X389599
Sousa, A. N. de, Bertini, G., Taddei, F. G., da Costa, R. C., Silva, T. E. da, & Fransozo, A. (2018). Modulating factors of the abundance and distribution of Achelous spinimanus (Latreille, 1819) (Decapoda, Portunoidea), a fishery resource, in Southeastern Brazil. PeerJ, 6, e5720. https://doi.org/10.7717/peerj.5720
Trimm, N. A., & Hayes, F. E. (2008). Distributional Ecology of the anemone shrimp Periclimenes rathbunae associating with the sea anemone Stichodactyla helianthus at Tobago. Ushey, K., Allaire, J., Tang, Y., Eddelbuettel, E., Lewis, B., Keydana, S., Hafen, R., & Geelnard, M. (2020). Interface to “Python”. Package ‘reticulate.’
Wickham, H., Chang, W., Henry, L., Lin-Pedersen, T., Takahashi, K., Wilke, C., Woo, W., Yutani, H., & Dunnington, D. (2023). Create Elegant Data Visualisations Using the Grammar of Graphics. Package ‘ggplot2’ [Computer software]. Wilkinson, C. R., & Evans, E. (1989). Sponge distribution across Davies Reef, Great Barrier Reef, relative to location, depth, and water movement. Coral Reefs, 8(1), 1–7. https://doi.org/10.1007/BF00304685