knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)
library(vroom)
library(ggsci)
library(scholar)
load("gs.RData")
The R package scholar
allows you to access Google
Scholar citation data using the Google Scholar ID. I’m still exploring
the possibilities, but the package is very cool.
The number of queries to the API is monitored, so be careful how you structure your code. Many of my chunks are “commented-out” so that they will not run. The data has been saved and loaded into the R environment so the queries do not need to be repeated.
We can use Jason B. Reed as an example.
Find their ID from the URL of the webpage. There are ways to search by name too with the package.
gs_id <- "CpBa1V4AAAAJ"
Get the basic profile data for Jason. (Avoid running this chunk more than once.)
# jbr_profile <- get_profile(gs_id)
The result is a list.
jbr_profile
## $id
## [1] "CpBa1V4AAAAJ"
##
## $name
## [1] "Jason B Reed"
##
## $affiliation
## [1] "Associate Professor, Purdue University"
##
## $total_cites
## [1] 425
##
## $h_index
## [1] 13
##
## $i10_index
## [1] 18
##
## $fields
## [1] "Systematic Reviews" "Library Management"
## [3] "Library Professional Development"
##
## $homepage
## character(0)
##
## $coauthors
## [1] "Benjamin Jahre" "Alexander J. Carroll" "Leo S. Lo"
## [4] "Sort by citations" "Sort by year" "Sort by title"
## [7] "About Scholar" "Search help"
##
## $available
## [1] 3
##
## $not_available
## [1] 0
You can use another function to extract the publications from Google Scholar.
# jbr_pubs <- get_publications(gs_id)
The result is a data frame, but I prefer a tibble. The variables are common bibliometric variables including cumulative citations.
jbr_pubs <- jbr_pubs %>%
as_tibble()
jbr_pubs
## # A tibble: 55 × 8
## title author journal number cites year cid pubid
## <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 Pharmacists’ impact on older a… JL Ne… Vaccine "38 (… 40 2020 1473… roLk…
## 2 The effects of serious gaming … L van… Climat… "170 … 35 2022 1820… qxL8…
## 3 Poultry consumption and human … G Con… Advanc… "13 (… 33 2022 1593… TQgY…
## 4 Reviewing the current state of… JB Re… Collec… "44 (… 32 2019 2445… W7OE…
## 5 Meat consumption and gut micro… Y Wan… Advanc… "14 (… 23 2023 1174… QIV2…
## 6 Examining positive youth devel… E Maj… The Jo… "42 (… 23 2022 1711… YOwf…
## 7 A scoping review of engineerin… M Phi… Journa… "113 … 22 2024 1761… r0Bp…
## 8 Not just playing: The politics… JM Ve… Geofor… "137,… 21 2022 1178… 4DMP…
## 9 Effect of pharmacy-led interve… M Har… Journa… "62 (… 21 2022 4429… Wp0g…
## 10 The gender wage gap in researc… HA Ho… Colleg… "" 19 2020 8504… Se3i…
## # ℹ 45 more rows
You can get historical citations for specific publications. Below, I do the top 10 articles for JBR. I’m not sure of the API limits for this.
# pubids <- pull(jbr_pubs, pubid)[1:10]
# jbr_ach <- get_article_cite_history(gs_id, pubids[1])
# for(i in 2:10){
# ach <- get_article_cite_history(gs_id, pubids[i])
# jbr_ach <- bind_rows(jbr_ach, ach)
# }
The result links the article via the pubid
. However, you
could devise ways to join this to data such as journal-year, etc.
jbr_ach <- jbr_ach %>%
as_tibble()
jbr_ach
## # A tibble: 41 × 3
## year cites pubid
## <int> <dbl> <chr>
## 1 2020 1 roLk4NBRz8UC
## 2 2021 12 roLk4NBRz8UC
## 3 2022 5 roLk4NBRz8UC
## 4 2023 11 roLk4NBRz8UC
## 5 2024 8 roLk4NBRz8UC
## 6 2025 2 roLk4NBRz8UC
## 7 2022 3 qxL8FJ1GzNcC
## 8 2023 13 qxL8FJ1GzNcC
## 9 2024 15 qxL8FJ1GzNcC
## 10 2025 4 qxL8FJ1GzNcC
## # ℹ 31 more rows
Visualize the yearly citations.
jbr_ach %>%
ggplot() +
aes(x=year, y=cites, color=pubid) +
geom_line(lwd=1) +
scale_color_aaas() +
theme_bw()
Determine cumulative citations as visualize.
jbr_ach %>%
group_by(pubid) %>%
mutate(cumal_cites = cumsum(cites)) %>%
ggplot() +
aes(x=year, y=cumal_cites, color=pubid) +
geom_line(lwd=1) +
scale_color_aaas() +
theme_bw()
I have questions about this. The matching is fuzzy, but the result does not show the match between query and result.
# jbr_journals <- jbr_pubs %>%
# pull(journal) %>%
# unique()
# jbr_jr <- get_journalrank(journals=jbr_journals, max.distance=0.1)
jbr_jr <- jbr_jr %>%
as_tibble()
jbr_jr
## # A tibble: 29 × 20
## Rank Sourceid Journal Type Issn SJR SJR.Best.Quartile H.index
## <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <int>
## 1 2384 21376 Vaccine jour… 0264… 1.39 Q1 191
## 2 2513 12177 Climatic Chan… jour… 1573… 1.36 Q1 198
## 3 1107 21100202730 Advances in N… jour… 2156… 2.15 Q1 103
## 4 9327 4700152789 Collection Ma… jour… 0146… 0.536 Q1 19
## 5 4671 25675 Journal of Ea… jour… 1552… 0.917 Q1 73
## 6 2735 12481 Journal of En… jour… 1069… 1.29 Q1 113
## 7 2287 28611 Geoforum jour… 0016… 1.42 Q1 125
## 8 1 28773 Ca-A Cancer J… jour… 1542… 56.2 Q1 182
## 9 3440 14238 College and R… jour… 0010… 1.11 Q1 55
## 10 9310 19500157042 Currents in P… jour… 1877… 0.537 Q1 23
## # ℹ 19 more rows
## # ℹ 12 more variables: Total.Docs...2021. <int>, Total.Docs...3years. <int>,
## # Total.Refs. <int>, Total.Cites..3years. <int>,
## # Citable.Docs...3years. <int>, Cites...Doc...2years. <dbl>,
## # Ref....Doc. <dbl>, Country <chr>, Region <chr>, Publisher <chr>,
## # Coverage <chr>, Categories <chr>
This is a cool feature. Be careful not to unnecessarily repeat queries.
# compare_ids <- c("CpBa1V4AAAAJ", "5qP2Sl0AAAAJ", "2xGm54gAAAAJ", "2Zj8gKoAAAAJ", "zac0dKsAAAAJ", "zdI9WLAAAAAJ", "aTWTBZMAAAAJ", "ZPJBrW4AAAAJ", "1SW-xYYAAAAJ")
# compare_tb <- compare_scholar_careers(compare_ids) %>%
# as_tibble()
The result is year cites for the authors.
compare_tb
## # A tibble: 181 × 5
## id year cites career_year name
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 1SW-xYYAAAAJ 2018 15 0 Margaret Phillips
## 2 1SW-xYYAAAAJ 2019 40 1 Margaret Phillips
## 3 1SW-xYYAAAAJ 2020 61 2 Margaret Phillips
## 4 1SW-xYYAAAAJ 2021 85 3 Margaret Phillips
## 5 1SW-xYYAAAAJ 2022 124 4 Margaret Phillips
## 6 1SW-xYYAAAAJ 2023 196 5 Margaret Phillips
## 7 1SW-xYYAAAAJ 2024 203 6 Margaret Phillips
## 8 1SW-xYYAAAAJ 2025 48 7 Margaret Phillips
## 9 2xGm54gAAAAJ 2001 15 0 Michael Fosmire
## 10 2xGm54gAAAAJ 2002 9 1 Michael Fosmire
## # ℹ 171 more rows
Compare yearly citations.
compare_tb %>%
ggplot() +
aes(x=year, y=cites, color=name) +
geom_line(lwd=1) +
scale_x_continuous(limits=c(2012, 2026), breaks=seq(from=2012, to=2026, by=2)) +
scale_color_simpsons() +
theme_bw() +
ggtitle("Yearly Citations for Select Libraries' Faculty")
Compare cumulative citations.
compare_tb %>%
group_by(name) %>%
mutate(cumal_cites = cumsum(cites)) %>%
ggplot() +
aes(x=year, y=cumal_cites, color=name) +
geom_line(lwd=1) +
scale_x_continuous(limits=c(2012, 2026), breaks=seq(from=2012, to=2026, by=2)) +
scale_color_simpsons() +
theme_bw() +
ggtitle("Cumulative Citations for Select Libraries' Faculty")
Seems unfair to compare shorter careers to longer careers. Limit the years. Long-established scholars will still have more publications.
compare_tb %>%
filter(year >= 2018) %>%
group_by(name) %>%
mutate(cumal_cites = cumsum(cites)) %>%
ggplot() +
aes(x=year, y=cumal_cites, color=name) +
geom_line(lwd=1) +
scale_x_continuous(limits=c(2018, 2026), breaks=seq(from=2018, to=2026, by=2)) +
scale_color_simpsons() +
theme_bw() +
ggtitle("Cumulative Citations for Select Libraries' Faculty")
Save the data in the environment so the Google Scholar queries do not need to be repeated.
# save.image("gs.RData")