Overview of OpenAlex

Prior to starting, it is very important to read the documentation for the package. In this document, I provide modified preparatory code based on the work those in the Quantitative Histories Workshop or other users in the Howard network.

Detailed information for the OpenAlex R package is located online at https://journal.r-project.org/articles/RJ-2023-089/. You can view the package’s ReadMe online at https://docs.ropensci.org/openalexR/.

The OpenAlex database consists of eight academic entity types: - works - authors - institutions - sources - concepts - publishers - funders - geo

# install.packages("remotes", repos = "http://cran.us.r-project.org")
remotes::install_github("ropensci/openalexR")
## Using GitHub PAT from the git credential store.
## Downloading GitHub repo ropensci/openalexR@HEAD
## curl (6.0.1 -> 6.1.0) [CRAN]
## Installing 1 packages: curl
## 
## The downloaded binary packages are in
##  /var/folders/54/02xngdrx2277pg5hxjx3z8040000gn/T//Rtmpcf7QzL/downloaded_packages
## ── R CMD build ─────────────────────────────────────────────────────────────────
##      checking for file ‘/private/var/folders/54/02xngdrx2277pg5hxjx3z8040000gn/T/Rtmpcf7QzL/remotes1504049eb8d92/ropensci-openalexR-e1aa4cd/DESCRIPTION’ ...  ✔  checking for file ‘/private/var/folders/54/02xngdrx2277pg5hxjx3z8040000gn/T/Rtmpcf7QzL/remotes1504049eb8d92/ropensci-openalexR-e1aa4cd/DESCRIPTION’ (656ms)
##   ─  preparing ‘openalexR’:
##      checking DESCRIPTION meta-information ...  ✔  checking DESCRIPTION meta-information
##   ─  checking for LF line-endings in source and make files and shell scripts (393ms)
##   ─  checking for empty or unneeded directories
##    Removed empty directory ‘openalexR/vignettes’
##   ─  building ‘openalexR_1.4.0.tar.gz’
##      
## 
# install OpenAlex
install.packages("openalexR", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/54/02xngdrx2277pg5hxjx3z8040000gn/T//Rtmpcf7QzL/downloaded_packages
# set up .Rprofile and add polite option
# file.edit("~/.Rprofile")

# or open .Renviron and add polite option
# file.edit("~/.Renviron")

# add polite use option via email to your .Rprofile or .Renviron
# options(openalexR.mailto = "first.last@Howard.edu")

# load libraries
library(openalexR)
## Thank you for using openalexR!
## To acknowledge our work, please cite the package by calling `citation("openalexR")`.
## To suppress this message, add `openalexR.message = suppressed` to your .Renviron file.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Concept searches

In OpenAlex, we can explore the universe of concepts for a particular term.

We can start with racism.

racism <- oa_fetch(
  entity = "concepts",
  display_name.search = "racism" # search by concept name "racism"
)

# print the display names
racism %>% 
  print(n=Inf)
## # A tibble: 5 × 17
##   id      display_name display_name_interna…¹ description description_internat…²
##   <chr>   <chr>        <list>                 <chr>       <list>                
## 1 https:… Racism       <chr [152]>            belief tha… <chr [40]>            
## 2 https:… Rac GTP-Bin… <chr [5]>              <NA>        <lgl [1]>             
## 3 https:… Psychometri… <chr [1]>              <NA>        <lgl [1]>             
## 4 https:… Institution… <chr [30]>             racism roo… <chr [9]>             
## 5 https:… Anti-racism  <chr [34]>             beliefs, a… <chr [10]>            
## # ℹ abbreviated names: ¹​display_name_international, ²​description_international
## # ℹ 12 more variables: wikidata <chr>, level <int>, ids <list>,
## #   image_url <chr>, image_thumbnail_url <chr>, ancestors <list>,
## #   related_concepts <list>, relevance_score <dbl>, works_count <int>,
## #   cited_by_count <int>, counts_by_year <list>, works_api_url <chr>
# we can also view the IDs directly
racism$id
## [1] "https://openalex.org/C139838865"  "https://openalex.org/C2909331102"
## [3] "https://openalex.org/C137863202"  "https://openalex.org/C2776793201"
## [5] "https://openalex.org/C2779234418"

Then, I specify the search to antiblackness.

antiblackness <- oa_fetch(
  entity = "concepts",
  display_name.search = "antiblackness" # search by concept name "antiblackness"
)
## Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, :
## No records found!

You will notice that no records were found from this term. You can edit the term to see what other concepts are included in the library.

Literature Searches

To conduct literature searchers, we use the search argument of the oa_fetch() function.

racism_stem <- oa_fetch(
  search = "racism AND stem",
  pages = 1,
  per_page = 10,
  verbose = TRUE
)
## Requesting url: https://api.openalex.org/works?search=racism%20AND%20stem
## Using basic paging...
## Getting 1 page of results with a total of 10 records...
racism_stem %>% 
  show_works(simp_func = identity) %>% 
  select(1:4)
## # A tibble: 10 × 4
##    id          display_name                             first_author last_author
##    <chr>       <chr>                                    <chr>        <chr>      
##  1 W3103976293 Interrogating Structural Racism in STEM… Ebony O. Mc… <NA>       
##  2 W2122152063 Rac and Cdc42 GTPases control hematopoi… Feng‐Chun Y… David A. W…
##  3 W1963010559 Critical Race Theory, Racial Microaggre… Daniel G. S… Tara J. Yo…
##  4 W2064771788 Rac GTPases differentially integrate si… José A. Can… David A. W…
##  5 W3162198024 Racism camouflaged as impostorism and t… Ebony O. Mc… Devin T. W…
##  6 W2032731146 A Dual Role of the GTPase Rac in Cardia… Michel Pucé… Philippe F…
##  7 W2073080076 Niche-Associated Activation of Rac Prom… Wen Lü       Edwin L. F…
##  8 W2081016623 AIDS and Accusation: Haiti and the Geog… Robert G. C… Paul Farmer
##  9 W2087413829 Stem Cell Depletion Through Epidermal D… Salvador Az… Fiona M. W…
## 10 W1981364695 Three-Dimensional Nanofibrillar Surface… Alam Nur‐E‐… Sally Mein…

With the entity option using works, we have the following commands:

oa_fetch(
entity = "works",
title.search = "racism AND STEM",
count_only = TRUE
)
##      count db_response_time_ms page per_page
## [1,]    56                  41    1        1
oa_fetch(
entity = "works",
title.search = "racism|STEM",
count_only = TRUE
)
##       count db_response_time_ms page per_page
## [1,] 541427                 122    1        1
biblio_works <- oa_fetch(
  entity = "works",
  title.search = "racism AND STEM",
  count_only = FALSE
)

biblio_works
## # A tibble: 56 × 39
##    id     title display_name author ab    publication_date relevance_score so   
##    <chr>  <chr> <chr>        <list> <chr> <chr>                      <dbl> <chr>
##  1 https… Inte… Interrogati… <df>   The … 2020-11-13                 303.  Educ…
##  2 https… Rac … Rac GTPases… <df>   <NA>  2005-07-17                 271.  Natu…
##  3 https… Rac … Rac and Cdc… <df>   Crit… 2001-04-24                 229.  Proc…
##  4 https… The … The role of… <df>   <NA>  2006-07-24                 143.  Expe…
##  5 https… A Du… A Dual Role… <df>   The … 2003-03-25                 143.  Mole…
##  6 https… Raci… Racism camo… <df>   Blac… 2021-05-13                 141.  Race…
##  7 https… Rac … Rac Mediate… <df>   <NA>  2011-11-01                 111.  Cell…
##  8 https… Nich… Niche-Assoc… <df>   Back… 2012-07-03                 111.  PLoS…
##  9 https… Role… Role of Rac… <df>   In t… 2005-11-01                  95.8 Anti…
## 10 https… An i… An injury-r… <df>   <NA>  2022-05-20                  95.2 Cell…
## # ℹ 46 more rows
## # ℹ 31 more variables: so_id <chr>, host_organization <chr>, issn_l <chr>,
## #   url <chr>, pdf_url <chr>, license <chr>, version <chr>, first_page <chr>,
## #   last_page <chr>, volume <chr>, issue <chr>, is_oa <lgl>,
## #   is_oa_anywhere <lgl>, oa_status <chr>, oa_url <chr>,
## #   any_repository_has_fulltext <lgl>, language <chr>, grants <list>,
## #   cited_by_count <int>, counts_by_year <list>, publication_year <int>, …

Please take note that some of the results do not match our desired cases.