Code
if(!require("remotes")) install.packages("remotes")
::install_github("julianflowers/srUtils")
remotes
library(srUtils)
This note introduces the srUtils
package which is a very early iteration of a set of functions designed to assist undertaking systematic mapping exercises in ecology.
An essential part of systematic literature review is extraction of key information in coded format.
Some information can be readily extracted using regular expressions -functions are available for some of these. Other kinds of information extraction requires more sophisticated analysis using natural language processing (nlp) - a form of AI. Using nlp we can extract locations, chemicals and taxa. This is more complex and functions are not yet available for this package although a walk through is available.
Currently the package has 4 functions all in the format get_x
.
These are:
get_air_temperature
- this identifies expressions of the form degrees C
get_chemical
- this identifies simple expressions like N2O, CO2 and the term greenhouse gases1
get_elevation
- this identifies elevations i.e. points above sea level, in texts
get_coordinates
- this detects latitudes and longitudes in various formats including polar and decimal coordinates and eastings and northings.
The package is not on CRAN but can be installed from Github using devtools
or remotes
using the code below.
if(!require("remotes")) install.packages("remotes")
::install_github("julianflowers/srUtils")
remotes
library(srUtils)
To use srUtils
the tidyverse
package is very useful, as is readtext
for reading in large numbers of text files (e.g. pdfs or word docs).
library(tidyverse, quietly = TRUE); library(readtext)
To show how to use the package we will load some articles in pdf format.
<- here::here("/Users/julianflowers/Library/Mobile Documents/com~apple~CloudDocs/Desktop/herbivores_ghg/my_corpus")
path
<- list.files(path, "pdf$", full.names = TRUE)
pdfs
<- map_dfr(pdfs, readtext) pdf_text
<- get_elevation(pdf_text$text[5])
el
paste(pdf_text$doc_id[5], el$value[[1]][3], el$value[[1]][4])
[1] "1-s2.0-S001670611200119X-main.pdf 52 m"
<- get_air_temperature(pdf_text$text[5])
air_temp
|>
air_temp unnest("value")
# A tibble: 11 × 2
name value[,1] [,2] [,3] [,4]
<int> <chr> <chr> <chr> <chr>
1 1 "22 °C. More than " 22 ° C
2 1 "10 °C. As grass requires a minimum temperature >" 10 ° C
3 1 "300 °C. Gases of known N" 300 ° C
4 1 "11 °C and WFPS was " 11 ° C
5 1 "22 °C. Such environmental\nMATLAB (Math works USA, " 22 ° C
6 1 "30\n Soil … 30 ° C
7 1 "17 °C to be suit-\n … 17 ° C
8 1 "20 °C) and an N application on\n … 20 ° C
9 1 "18 °C) and\n … 18 ° C
10 1 "16 °C and were observed b… 16 ° C
11 1 "18 °C and there was an N application on June … 18 ° C
<- get_chemical(pdf_text$text[5])
chem
|>
chem unnest("value") |>
distinct()
# A tibble: 10 × 2
name value[,1]
<int> <chr>
1 1 "N2O"
2 1 "nitro"
3 1 "nitrog"
4 1 "NO"
5 1 "carbon"
6 1 " Monthly N2O flux"
7 1 "NO2"
8 1 "CO"
9 1 "greenhouse gases between terrestrial ecosystems"
10 1 "greenhouse effect."
<- get_coordinates(pdf_text$text[5])
coords
|>
coords unnest("value") |>
distinct()
# A tibble: 2 × 2
name value[,1] [,2] [,3] [,4] [,5]
<int> <chr> <chr> <chr> <chr> <chr>
1 1 52°07 N 52 ° 07 N
2 1 08°16 W 08 ° 16 W
We can apply these functions to multiple papers at the same time using the map
functions from the purrr
pacakage (part of the tidyverse.
<- pdf_text |>
coords2 mutate(coords = map(text, get_coordinates))
<- coords2 |>
coords3 unnest("coords") |>
unnest_auto("value")
$value |>
coords3as.matrix() |>
data.frame()
X1 X2 X3 X4 X5
1 52°07 N 52 ° 07 N
2 08°16 W 08 ° 16 W
3 52° 8′ N 52 ° 8 N
4 54° 20′ N 54 ° 20 N
5 8° 19′ W 8 ° 19 W
6 568570 N, 4835W 568570 N 4835 W
7 458380 N, 28440 E 458380 N 8440 E
8 54.18◦ N 54 18 ◦ N
9 2.36◦ E 2 36 ◦ E
10 56◦ 29\004 N 56 ◦ 29 N
11 9◦ 34\004 E 9 ◦ 34 E
12 53°13′N 53 ° 13 N
13 4°0ʹW; Fig. S 4 ° 0 S
14 4°0ʹW 4 ° 0 W
15 52°18′N 52 ° 18 N
16 6°30′W 6 ° 30 W
17 50°45′N 50 ° 45 N
18 3°50′W 3 ° 50 W
19 52.30◦ N 52 30 ◦ N
20 6.40◦ W 6 40 ◦ W
21 52◦ 31’N 52 ◦ 31 N
22 0◦ 23’E 0 ◦ 23 E
23 1000 N, 3\001 540 0500 W 1000 N 0500 W
24 528160 N, 88250 W 528160 N 8250 W
25 53° 30’ N 53 ° 30 N
26 6° 10’ E 6 ° 10 E
27 55◦ 52’N 55 ◦ 52 N
28 03◦ 02’W 03 ◦ 02 W
29 51° 46'N 51 ° 46 N
30 9° 42'E 9 ° 42 E
31 N54:41:18 N 54 41 18
32 W2:22:45 W 2 22 45
33 8199N, 3u4490.8199W 8199 N 8199 W
34 10.17 °W 10 17 ° W
35 10.12 °W 10 12 ° W
36 10.17 °W 10 17 ° W
37 9.59 °W 9 59 ° W
38 9.55 °W 9 55 ° W
39 9.40 °W 9 40 ° W
40 9.43 °W 9 43 ° W
41 9.38 °W 9 38 ° W
42 51.47 °N 51 47 ° N
43 51.36 °N 51 36 ° N
44 51.58 °N 51 58 ° N
45 51.58 °N 51 58 ° N
46 51.44 °N 51 44 ° N
47 51.37 °N 51 37 ° N
48 51.35 °N 51 35 ° N
49 51.35 °N 51 35 ° N