-To access this, go to my website: -https://tinyurl.com/meta-R
-Download the R markdown File here: https://drive.google.com/file/d/1Nq0RdDQ1-dL4Kxn5J6QgfXodWFoEZnlI/view?usp=sharing
-@NotBenRivera on Twitter
-Live tutorial at 2:30 Pacific on 10.18.2022 https://ucdavis.zoom.us/s/97766151914
Example Word Cloud
I strongly recommend downloading the script from the link above and running it line by line rather than trying to run it all at once!
Below are the Packages you will use here today! If you don’t have
these all downloaded, uncomment the lines with just one “#” and download
those packages. Most are on CRAN, but two require the packages
remotes
and devtools
in order to download from
github. so you may have to download those too.
#####Install from CRAN#######
#install.packages("tidyverse", "metagear", "googlesheets4", "googledrive", "here", "wordcloud","wordcloud2", "tm")
#####Install from GitHub####
#if (!require("remotes") ){install.packages("remotes")}
#remotes::install_github("netique/scihubr")
#devtools::install_github("juba/rwos") #may need to install 'devtools' from CRAN
#####Load Packages########
library(tidyverse)
library(rwos) #
library(metagear)
library(scihubr) #
library(googlesheets4)
library(googledrive)
library(here)
library(wordcloud)
library(wordcloud2)
library(tm)
Example VPN
################ INSERT KEYWORDS BELOW ##################
SearchTerms <- "Coastal Dune Nitrogen Fixation" # <--- CHANGE THIS TO YOUR SEARCH TERM
#########################################################
sid <- wos_authenticate() #This connects you to the Web of Science API
res <- wos_search(sid, paste0("TS=(", SearchTerms, ")"), url = "http://search.webofknowledge.com") #this does the search
## 25 records found
pubs <- wos_retrieve_all(res) # This retrieves them
## Retrieving 1-25 of 25
#Checks to see if it works
head(pubs)
## # A tibble: 6 x 15
## uid title journal issue volume pages date year authors keywords doi
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 WOS:00017~ Char~ ACTA B~ 3 148 191-~ <NA> 2001 Hatimi~ "Acacia~ <NA>
## 2 WOS:00040~ Does~ BIOLOG~ <NA> 212 416-~ AUG 2017 Aggenb~ "Dune g~ 10.1~
## 3 WOS:00028~ Chan~ JOURNA~ 1 15 207-~ MAR 2011 Provoo~ "Coasta~ 10.1~
## 4 WOS:00024~ Gene~ APPLIE~ 15 73 5066~ AUG 2007 Rodrig~ "" 10.1~
## 5 WOS:00025~ Fact~ PLANT ~ 1-2 307 219-~ JUN 2008 Jones,~ "climat~ 10.1~
## 6 WOS:00020~ Biog~ PHYTOC~ <NA> 23 115-~ DEC ~ 1993 Gerlac~ "Nitrog~ <NA>
## # ... with 4 more variables: article_no <chr>, isi_id <chr>, issn <chr>,
## # isbn <chr>
dim(pubs) ### THIS IS IMPORTANT!!!!
## [1] 25 15
#THE FIRST NUMBER IS THE NUMBER OF PAPERS THE SEARCH RETUNED
ifelse(dim(pubs)[1] > 300, quit(save="ask"), print("All Good"))
## [1] "All Good"
## [1] "All Good"
In this section, we are running all the DOIs through a for loop to download all the PDFS and put them in a folder called “SearchResults”. If you have this in an R Project (recommended), this folder will be saved in the same place as this file. If you are running this as a regular script somehow, it will be in your working directory.
VPN needs to now be turned OFF
This part takes a while and has ~70-80% success rate in that it is able to download about 7 out of 10 pdfs for each found result
try_catch <- function(exprs) {!inherits(try(eval(exprs)), "try-error")} #this allows the for-loop to proceed even if there is an error
dir.create(here("SearchResultsTest")) #Creates folder to put the pdfs
## Warning in dir.create(here("SearchResultsTest")): 'C:
## \Users\benny\OneDrive\Documents\_Davis\Projects\MetaAnalysis\SearchResultsTest'
## already exists
for (i in 1:nrow(pubs)){ #iterate through list of search results
fileDOI<-pubs[i,]$doi #extracts DOI
filename<- paste(pubs[i,]$title, ".pdf", sep = "") #extract file name to be used to create file
filename2<- pubs[i,]$title #just gets file name
try(ifelse(try_catch(download_paper(query = fileDOI, path = paste(here::here("SearchResultsTest"),"/", filename, ".pdf", sep = ""), open = FALSE)), #checks if Sci-hub will work
download_paper(query = fileDOI, path = paste(here::here("SearchResultsTest"),"/", filename, ".pdf", sep = ""), open = FALSE), #if it can, it downloads straight from sci-hub
PDF_download( DOI = fileDOI, theFileName = filename2, directory = here::here("SearchResultsTest"), WindowsProxy = TRUE ) )) # If sci-hub won't work, it tries this instead. May need to change 'WindowsProxy' to false if not on Windows.
}
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
## Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
## Extraction 2 of 2: PDF download... skipped
## Warning in file(con, "wb"): cannot open file 'C:/Users/benny/OneDrive/Documents/
## _Davis/Projects/MetaAnalysis/SearchResultsTest/Does atmospheric nitrogen
## deposition lead to greater nitrogen and carbon accumulation in coastal sand
## dunes?.pdf.pdf': Invalid argument
## Error in file(con, "wb") : cannot open the connection
## Collecting PDF from DOI: 10.1016/j.biocon.2016.12.007
## Extraction 1 of 2: HTML script.... successful
## Extraction 2 of 2: PDF download... failed, connections too slow or files not PDF format
## i Cite as: Provoost, S., Jones, M. L. M., & Edmondson, S. E. (2009).
## Changes in landscape and vegetation of coastal dunes in northwest Europe: a review. Journal of Coastal Conservation, 15(1), 207–226.
## doi:10.1007/s11852-009-0068-5i Cite as: Provoost, S., Jones, M. L. M., & Edmondson, S. E. (2009).
## Changes in landscape and vegetation of coastal dunes in northwest Europe: a review. Journal of Coastal Conservation, 15(1), 207–226.
## doi:10.1007/s11852-009-0068-5
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Rodriguez-Echeverria, S., Crisostomo, J. A., & Freitas, H. (2007).
## Genetic Diversity of Rhizobia Associated with Acacia longifolia in Two Stages of Invasion of Coastal Sand Dunes. Applied and Environmental Microbiology, 73(15), 5066–5070.
## doi:10.1128/aem.00613-07
## i Cite as:
## Rodriguez-Echeverria, S., Crisostomo, J. A., & Freitas, H. (2007).
## Genetic Diversity of Rhizobia Associated with Acacia longifolia in Two Stages of Invasion of Coastal Sand Dunes. Applied and Environmental Microbiology, 73(15), 5066–5070.
## doi:10.1128/aem.00613-07
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Jones, M. L. M., Sowerby, A., Williams, D. L., & Jones, R. E. (2008).
## Factors controlling soil development in sand dunes: evidence from a coastal dune soil chronosequence. Plant and Soil, 307(1-2), 219–234.
## doi:10.1007/s11104-008-9601-9
## i Cite as:
## Jones, M. L. M., Sowerby, A., Williams, D. L., & Jones, R. E. (2008).
## Factors controlling soil development in sand dunes: evidence from a coastal dune soil chronosequence. Plant and Soil, 307(1-2), 219–234.
## doi:10.1007/s11104-008-9601-9
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
## Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
## Extraction 2 of 2: PDF download... skipped
## i Cite as: Kooijman, A. M., Lubbers, I., & van Til, M. (2009).
## Iron-rich dune grasslands: Relations between soil organic matter and sorption of Fe and P. Environmental Pollution, 157(11), 3158–3165.
## doi:10.1016/j.envpol.2009.05.022i Cite as: Kooijman, A. M., Lubbers, I., & van Til, M. (2009).
## Iron-rich dune grasslands: Relations between soil organic matter and sorption of Fe and P. Environmental Pollution, 157(11), 3158–3165.
## doi:10.1016/j.envpol.2009.05.022
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Arun, A. B., & Sridhar, K. R. (2004).
## Symbiotic performance of fast-growing rhizobia isolated from the coastal sand dune legumes of west coast of India. Biology and Fertility of Soils, 40(6), 435–439.
## doi:10.1007/s00374-004-0800-0
## i Cite as:
## Arun, A. B., & Sridhar, K. R. (2004).
## Symbiotic performance of fast-growing rhizobia isolated from the coastal sand dune legumes of west coast of India. Biology and Fertility of Soils, 40(6), 435–439.
## doi:10.1007/s00374-004-0800-0
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Hanslin, H. M., & Kollmann, J. (2016).
## Positive responses of coastal dune plants to soil conditioning by the invasive Lupinus nootkatensis. Acta Oecologica, 77, 1–9.
## doi:10.1016/j.actao.2016.08.007
## i Cite as:
## Hanslin, H. M., & Kollmann, J. (2016).
## Positive responses of coastal dune plants to soil conditioning by the invasive Lupinus nootkatensis. Acta Oecologica, 77, 1–9.
## doi:10.1016/j.actao.2016.08.007
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: 10.1007/s12237-022-01052-2
## Extraction 1 of 2: HTML script.... successful
## Extraction 2 of 2: PDF download... successful
## i Cite as: Kooijman, A. M., van Til, M., Noordijk, E., Remke, E., & Kalbitz, K. (2017). Nitrogen deposition and grass encroachment in calcareous and acidic Grey dunes (H2130) in NW-Europe. Biological Conservation, 212, 406–415. doi:10.1016/j.biocon.2016.08.009i Cite as: Kooijman, A. M., van Til, M., Noordijk, E., Remke, E., & Kalbitz, K. (2017). Nitrogen deposition and grass encroachment in calcareous and acidic Grey dunes (H2130) in NW-Europe. Biological Conservation, 212, 406–415. doi:10.1016/j.biocon.2016.08.009
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Kim, D., & Yu, K. B. (2008).
## A conceptual model of coastal dune ecology synthesizing spatial gradients of vegetation, soil, and geomorphology. Plant Ecology, 202(1), 135–148.
## doi:10.1007/s11258-008-9456-4
## i Cite as:
## Kim, D., & Yu, K. B. (2008).
## A conceptual model of coastal dune ecology synthesizing spatial gradients of vegetation, soil, and geomorphology. Plant Ecology, 202(1), 135–148.
## doi:10.1007/s11258-008-9456-4
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Sridhar, K. R., Arun, A. B., Narula, N., Deubel, A., & Merbach, W. (2005).
## Patterns of Sole-Carbon-Source Utilization by Fast-Growing Coastal Sand Dune Rhizobia of the Southwest Coast of India. Engineering in Life Sciences, 5(5), 425–430.
## doi:10.1002/elsc.200520091
## i Cite as:
## Sridhar, K. R., Arun, A. B., Narula, N., Deubel, A., & Merbach, W. (2005).
## Patterns of Sole-Carbon-Source Utilization by Fast-Growing Coastal Sand Dune Rhizobia of the Southwest Coast of India. Engineering in Life Sciences, 5(5), 425–430.
## doi:10.1002/elsc.200520091
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Rodríguez-Echeverría, S. (2010).
## Rhizobial hitchhikers from Down Under: invasional meltdown in a plant-bacteria mutualism? Journal of Biogeography.
## doi:10.1111/j.1365-2699.2010.02284.x
## i Cite as:
## Rodríguez-Echeverría, S. (2010).
## Rhizobial hitchhikers from Down Under: invasional meltdown in a plant-bacteria mutualism? Journal of Biogeography.
## doi:10.1111/j.1365-2699.2010.02284.x
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Hellmann, C., Sutter, R., Rascher, K. G., Máguas, C., Correia, O., & Werner, C. (2011).
## Impact of an exotic N2-fixing Acacia on composition and N status of a native Mediterranean community. Acta Oecologica, 37(1), 43–50.
## doi:10.1016/j.actao.2010.11.005
## i Cite as:
## Hellmann, C., Sutter, R., Rascher, K. G., Máguas, C., Correia, O., & Werner, C. (2011).
## Impact of an exotic N2-fixing Acacia on composition and N status of a native Mediterranean community. Acta Oecologica, 37(1), 43–50.
## doi:10.1016/j.actao.2010.11.005
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
## Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
## Extraction 2 of 2: PDF download... skipped
## Error in open.connection(con, "rb") : Could not resolve host: downloads
## Collecting PDF from DOI: 10.1002/ldr.4078
## Extraction 1 of 2: HTML script.... successful
## Extraction 2 of 2: PDF download... failed, url connections too slow or unavailable
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
## Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
## Extraction 2 of 2: PDF download... skipped
## i Cite as: Selami, N., Auriac, M.-C., Catrice, O., Capela, D., Kaid-Harche, M., & Timmers, T. (2014).
## Morphology and anatomy of root nodules of Retama monosperma (L.)Boiss. Plant and Soil, 379(1-2), 109–119.
## doi:10.1007/s11104-014-2045-5i Cite as: Selami, N., Auriac, M.-C., Catrice, O., Capela, D., Kaid-Harche, M., & Timmers, T. (2014).
## Morphology and anatomy of root nodules of Retama monosperma (L.)Boiss. Plant and Soil, 379(1-2), 109–119.
## doi:10.1007/s11104-014-2045-5
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Bolhuis, H., Fillinger, L., & Stal, L. J. (2013).
## Coastal Microbial Mat Diversity along a Natural Salinity Gradient. PLoS ONE, 8(5), e63166.
## doi:10.1371/journal.pone.0063166
## i Cite as:
## Bolhuis, H., Fillinger, L., & Stal, L. J. (2013).
## Coastal Microbial Mat Diversity along a Natural Salinity Gradient. PLoS ONE, 8(5), e63166.
## doi:10.1371/journal.pone.0063166
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## Error in open.connection(con, "rb") : Could not resolve host: downloads
## Collecting PDF from DOI: 10.1007/s13199-021-00765-5
## Extraction 1 of 2: HTML script.... successful
## Extraction 2 of 2: PDF download... successful
## Error in open.connection(con, "rb") : Could not resolve host: downloads
## Collecting PDF from DOI: 10.1038/s41598-019-45490-8
## Extraction 1 of 2: HTML script.... successful
## Extraction 2 of 2: PDF download... successful
## i Cite as: Birnbaum, C., Bissett, A., Teste, F. P., & Laliberté, E. (2018).
## Symbiotic N2-Fixer Community Composition, but Not Diversity, Shifts in Nodules of a Single Host Legume Across a 2-Million-Year Dune Chronosequence. Microbial Ecology.
## doi:10.1007/s00248-018-1185-1i Cite as: Birnbaum, C., Bissett, A., Teste, F. P., & Laliberté, E. (2018).
## Symbiotic N2-Fixer Community Composition, but Not Diversity, Shifts in Nodules of a Single Host Legume Across a 2-Million-Year Dune Chronosequence. Microbial Ecology.
## doi:10.1007/s00248-018-1185-1
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Emery, S. M., & Rudgers, J. A. (2011).
## Beach Restoration Efforts Influenced by Plant Variety, Soil Inoculum, and Site Effects. Journal of Coastal Research, 274, 636–644.
## doi:10.2112/jcoastres-d-10-00120.1
## i Cite as:
## Emery, S. M., & Rudgers, J. A. (2011).
## Beach Restoration Efforts Influenced by Plant Variety, Soil Inoculum, and Site Effects. Journal of Coastal Research, 274, 636–644.
## doi:10.2112/jcoastres-d-10-00120.1
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
## i Cite as:
## Werner, C., Zumkier, U., Beyschlag, W., & Máguas, C. (2009).
## High competitiveness of a resource demanding invasive acacia under low resource supply. Plant Ecology, 206(1), 83–96.
## doi:10.1007/s11258-009-9625-0
## i Cite as:
## Werner, C., Zumkier, U., Beyschlag, W., & Máguas, C. (2009).
## High competitiveness of a resource demanding invasive acacia under low resource supply. Plant Ecology, 206(1), 83–96.
## doi:10.1007/s11258-009-9625-0
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
## replacement has length zero
Output of Pdf Downloading
Okay, this is just cool! I just wanted to make some neat wordclouds. Feel free to skip this part. It takes all the keywords found in the search and formats them for use. I do the same with the titles from the results. One creates a static image and the other creates a dynamic tool where hovering over word tells you how many times it occured.
Code shamelssly stolen from here and it is magic! https://cran.r-project.org/web/packages/wordcloud2/vignettes/wordcloud.html
#### Word cloud from key words
kw<-pubs$keywords
# Create a corpus
docs <- Corpus(VectorSource(kw))
docs <- docs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
dtm <- TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
set.seed(27)
wordcloud(words = df$word, freq = df$freq, min.freq = 1,max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : mycorrhizae could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : competition could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : mediterranean could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : progressive could not be fit on page. It will not be plotted.
wordcloud2(data=df, fontFamily = 'Times', color = "random-dark")
kw2<-pubs$title
docs2 <- Corpus(VectorSource(kw2))
docs2 <- docs2 %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs2 <- tm_map(docs2, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs2, content_transformer(tolower)):
## transformation drops documents
docs2 <- tm_map(docs2, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs2, removeWords, stopwords("english")):
## transformation drops documents
dtm2 <- TermDocumentMatrix(docs2)
matrix2 <- as.matrix(dtm2)
words2 <- sort(rowSums(matrix2),decreasing=TRUE)
df2 <- data.frame(word = names(words2),freq=words2)
set.seed(27)
wordcloud(words = df2$word, freq = df2$freq, min.freq = 1,max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
wordcloud2(data=df2, fontFamily = 'Times', color = "random-dark")
This is an important final step. This will upload the metadata and all the pdfs (as a zipped folder) to your google drive automatically. It will prompt you to sign in. I reccomend using your UC Davis google drive, but it does not totally matter.
#### RUN THIS IF THE FIRST TIME ####
#gs4_auth() #signs you in so it can upload to google sheets
#### This gives you the ability to alter sheets on your google drive ###
#gs4_create("SearchResultsDataTest", sheets = list(data = pubs)) #uploads list of publications and all the info to a google sheets
#zip(zipfile= "resultsTest.zip", files = here("SearchResultsTest")) #zips your pdfs
###this will ask you to sign in again, which is annoying, but deal with it.
#drive_upload(media = "resultsTest.zip") #uploads a zipped file to your google drive.
Then you can get started on reading and extracting!
I hope this worked for you! Let me know if you run into any troubles or have any ways for me to make it better. Thank you so much for giving it a look. Let me know if you actually end up using this and send me your word clouds!
@NotBenRivera on Twitter or email me at benrivera@ucdavis.edu