This guide will explain how to visualize key themes from a Google
Scholar profile using a WordCloud. This tutorial is designed to be
accessible to users with only a basic understanding of R programming
language.
(The entire code for this tutorial is consolidated into a single
function and presented at the end of this document. If you’re only
interested in generating the wordcloud, and don’t care about the
procedure, feel free to skip to the end of this document.”)
Load abstracts from Google Scholar
First, we need to specify a Google Scholar profile using a
researcher’s name. To do this, assign an author’s first and last name to
the objects “FirstName” and “LastName” (respectfully). For the purposes
of this tutorial, I’ll use my own name.
FirstName<-"Zak"
LastName<-"Witkower"
Next, retrieve all items indexed by Google Scholar (e.g.,
publications) for the specified researcher, using the scholar
library.
library(scholar)
library(tibble) #for data cleaning
# Retrieve unique Google Scholar identifier using first and last name objects created above.
IDnumber<-get_scholar_id(first_name = FirstName,
last_name = LastName)
# Generate list of articles indexed in Google Scholar
Mypublications<-tibble(get_publications(IDnumber))
The resulting dataframe is in wide form, with each row
representing a unique item indexed by Google Scholar for the specified
researcher, and each column representing a different variable or
attribute (e.g., title, journal name, number of citations, etc).
## # A tibble: 10 × 8
## title author journal number cites year cid pubid
## <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 Two signals of social rank: Pr… Z Wit… Journa… 118 (… 148 2020 1283… qjMa…
## 2 Bodily communication of emotio… Z Wit… Emotio… 11 (2… 125 2019 1582… UeHW…
## 3 Predicting cyberbullying perpe… C Bar… Aggres… 43 (2… 102 2017 5427… 2osO…
## 4 The evolution of pride and soc… JL Tr… Advanc… 62, 5… 85 2020 7726… hqOj…
## 5 A facial-action imposter: How … Z Wit… Psycho… 30 (6… 47 2019 3325… W7OE…
## 6 The psychological structure, s… E Mer… Curren… 39, 1… 39 2021 8498… ULOm…
## 7 How affect shapes status: Dist… Z Wit… Curren… 33, 1… 37 2020 1574… YsMS…
## 8 Breaking the link between prov… C Bar… Aggres… 42 (6… 19 2016 5389… 9yKS…
## 9 Evidence for distinct facial s… JD Ma… Affect… 2, 14… 18 2021 1827… KlAt…
## 10 Beyond face value: Evidence fo… Z Wit… Affect… 2 (3)… 17 2021 1677… _kc_…
Next we will to retrieve the abstracts for each publication using a
loop, and append each abstract to our dataframe.
# initialize new column to be filled with abstracts
Mypublications$abstract<-rep("initialize", nrow(Mypublications))
#"for" loop to add abstract for each publication
for (i in 1:nrow(Mypublications)) {
Abstracts <- get_publication_abstract(id = IDnumber,
pub_id = Mypublications$pubid[i])
#This is included to mitigate "replacement has length zero" error, in case there is an issue finding a file
ifelse(length(Abstracts) == 0, NA, Mypublications$abstract[i] <- Abstracts)
}
Create wordcloud
Finally, lets create a wordcloud.
(Note: You can adjust the aesthetics of the wordcloud using the
code below.)
library(wordcloud2)
library(randomcoloR)
#create wordcloud
wordcloud2(tidy.TDM,
color = randomColor(nrow(tidy.TDM), #create color palette with randomcoloR
hue = "random",
luminosity = "dark"),
fontWeight = "bold", #bold all items
rotate = 0, #max rotation of words within wordcloud
size=.75, #size of wordcloud
fontFamily = "Times") #font of wordcloud
Limitations
If the specified author includes their middle initial in their Google
Scholar profile, or has three names, you may encounter problems
generating their wordcloud. There are a few possible solutions:
* Input their first and last name exclusively (e.g., input “Jessica L.
Tracy” as “Jessica Tracy”, or “Gerben van Kleef” as “Gerben
Kleef”)
* Input their first and middle initials as their first name (e.g., input
“Friedrich M. Götz” as “FM Götz”)
* If all else fails, you can retrieve a researcher’s Google Scholar ID
manually from the hyperlink of their Google Scholar page, instead of
retrieving it based on their first and last name (i.e., the method used
here). Look for text “user=” embedded within the hyperlink, and extract
the 12 letters that follow. You can assign their ID as a character
string to the object “IDnumber”, and skip the first step of retrieving
Google Scholar IDs with the “get_scholar_id” function
This method relies heavily on querying Google Scholar, which may
introduce some complications. For example:
* An internet connection is required
* The specified researcher must have an active Google Scholar
profile
* The method will extract all files indexed by Google Scholar for the
specified profile, which may include non-peer-reviewed files or
supplemental materials
* Rate limits imposed by Google Scholar may restrict the total number
and speed of your requests. * If you try to generate a wordcloud for a
particularly productive researcher – which requires you to request all
of the publications and abstracts of a specified author – you may exceed
Google’s rate limit (i.e., too many requests in a short period of time),
resulting in an error.
You may wish to alternatively create a wordcloud that visualizes your
research by uploading PDF files of your research saved on your local
drive. This approach, which is described in more detail on my website.
If you have any questions, comments, or concerns, please get in touch
with me. I’m happy to help! Email: Zakwitkower@gmail.com
Website: www.ZakWitkower.com
LinkedIn: Zak
Witkower
Twitter: @Zakwitkower
Wrapping everything in a function
For simplicity, I’ve consolidated all the steps into a single
function that automates the creation of a wordcloud using just your
first and last name. Enter the first and last name of the researcher
whose research you want to visualize at the top of the code chunk
(assigned to objects “FirstName” and “LastName”), and then run the rest
of the code. The function will handle all additional steps, and output a
wordcloud.
#####################################################################
####### Please input the first and last name of a researcher ########
#####################################################################
FirstName<-"Zak"
LastName<-"Witkower"
########################################################################################
#### After assigning a name above, run the rest of the code to create the function: ####
########################################################################################
#load libraries
require(scholar)
require(tibble)
require(tm)
require(tidytext)
require(tidyverse)
require(wordcloud2)
require(randomcoloR)
#Create function
GoogleScholarWordCloud<-function(first,last){
#Get ID number from name
IDnumber<-get_scholar_id(first_name = first,
last_name = last)
#Get publications
Mypublications<-tibble(get_publications(IDnumber))
#Add abstracts
Mypublications$abstract<-rep("initialize", nrow(Mypublications))
for (i in 1:nrow(Mypublications)) {
Abstracts <- get_publication_abstract(id = IDnumber,
pub_id = Mypublications$pubid[i])
ifelse(length(Abstracts) == 0, NA, Mypublications$abstract[i] <- Abstracts)
}
#Create Corpus
MyAbstracts.corpus<-VCorpus(VectorSource(Mypublications$abstract))
#Create TDM from corpus
MyAbstracts.TDM<-TermDocumentMatrix(MyAbstracts.corpus,
control =
list(removePunctuation = TRUE,
stopwords = TRUE,
tolower = TRUE,
stemming = F,
removeNumbers = TRUE))
#Tidy TDM
MyAbstracts.TDM<-tidy(MyAbstracts.TDM) %>%
group_by(term) %>%
summarise(count,count = sum(count)) %>%
unique()%>%
ungroup() %>%
arrange(desc(count))
#retain top 100 most frequently used words
MyAbstracts.TDM<-head(MyAbstracts.TDM, 100)
#create wordcloud
wordcloud<-wordcloud2(MyAbstracts.TDM,
fontWeight = "bold", #bold all items
rotate = 0, #max rotation of words
size=.75, #size of wordcloud
fontFamily = "Times", #font of wordcloud
color = randomColor(nrow(MyAbstracts.TDM), #create color palette
hue = "random",
luminosity = "dark"))
return(wordcloud)
}
##################################################################
### Run the function we just created, using the assigned name ###
##################################################################
GoogleScholarWordCloud(FirstName, LastName)
############################ End. ###################################
