Introduction

-To access this, go to my website: -https://tinyurl.com/meta-R

-Download the R markdown File here: https://drive.google.com/file/d/1Nq0RdDQ1-dL4Kxn5J6QgfXodWFoEZnlI/view?usp=sharing

-@NotBenRivera on Twitter

-Live tutorial at 2:30 Pacific on 10.18.2022 https://ucdavis.zoom.us/s/97766151914

Example Word Cloud

Goals

Overview of how it works

I strongly recommend downloading the script from the link above and running it line by line rather than trying to run it all at once!

Packages

Below are the Packages you will use here today! If you don’t have these all downloaded, uncomment the lines with just one “#” and download those packages. Most are on CRAN, but two require the packages remotes and devtools in order to download from github. so you may have to download those too.

#####Install from CRAN#######
#install.packages("tidyverse", "metagear", "googlesheets4", "googledrive", "here", "wordcloud","wordcloud2", "tm")

#####Install from GitHub####
#if (!require("remotes") ){install.packages("remotes")}
#remotes::install_github("netique/scihubr")

#devtools::install_github("juba/rwos") #may need to install 'devtools' from CRAN

#####Load Packages########
library(tidyverse)
library(rwos) #
library(metagear)
library(scihubr) #
library(googlesheets4)
library(googledrive)
library(here)
library(wordcloud)
library(wordcloud2)
library(tm)

Part 2: Stealing papers from Sci-hub

In this section, we are running all the DOIs through a for loop to download all the PDFS and put them in a folder called “SearchResults”. If you have this in an R Project (recommended), this folder will be saved in the same place as this file. If you are running this as a regular script somehow, it will be in your working directory.

VPN needs to now be turned OFF

This part takes a while and has ~70-80% success rate in that it is able to download about 7 out of 10 pdfs for each found result

Code Downloads

try_catch <- function(exprs) {!inherits(try(eval(exprs)), "try-error")} #this allows the for-loop to proceed even if there is an error

dir.create(here("SearchResultsTest")) #Creates folder to put the pdfs
## Warning in dir.create(here("SearchResultsTest")): 'C:
## \Users\benny\OneDrive\Documents\_Davis\Projects\MetaAnalysis\SearchResultsTest'
## already exists
for (i in 1:nrow(pubs)){ #iterate through list of search results
  fileDOI<-pubs[i,]$doi #extracts DOI
  filename<- paste(pubs[i,]$title, ".pdf", sep = "") #extract file name to be used to create file
  filename2<- pubs[i,]$title #just gets file name
  try(ifelse(try_catch(download_paper(query = fileDOI, path = paste(here::here("SearchResultsTest"),"/", filename, ".pdf", sep = ""),  open = FALSE)),  #checks if Sci-hub will work
         download_paper(query = fileDOI, path = paste(here::here("SearchResultsTest"),"/", filename, ".pdf", sep = ""),  open = FALSE), #if it can, it downloads straight from sci-hub 
         PDF_download( DOI = fileDOI,  theFileName = filename2, directory = here::here("SearchResultsTest"), WindowsProxy = TRUE )   )) # If sci-hub won't work, it tries this instead. May need to change 'WindowsProxy' to false if not on Windows.
  
  }
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
##          Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
##          Extraction 2 of 2: PDF download... skipped
## Warning in file(con, "wb"): cannot open file 'C:/Users/benny/OneDrive/Documents/
## _Davis/Projects/MetaAnalysis/SearchResultsTest/Does atmospheric nitrogen
## deposition lead to greater nitrogen and carbon accumulation in coastal sand
## dunes?.pdf.pdf': Invalid argument
## Error in file(con, "wb") : cannot open the connection
## Collecting PDF from DOI: 10.1016/j.biocon.2016.12.007
##          Extraction 1 of 2: HTML script.... successful
##          Extraction 2 of 2: PDF download... failed, connections too slow or files not PDF format
## i Cite as:  Provoost, S., Jones, M. L. M., & Edmondson, S. E. (2009).
##   Changes in landscape and vegetation of coastal dunes in northwest Europe: a review. Journal of Coastal Conservation, 15(1), 207–226.
##   doi:10.1007/s11852-009-0068-5i Cite as:  Provoost, S., Jones, M. L. M., & Edmondson, S. E. (2009).
##   Changes in landscape and vegetation of coastal dunes in northwest Europe: a review. Journal of Coastal Conservation, 15(1), 207–226.
##   doi:10.1007/s11852-009-0068-5
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Rodriguez-Echeverria, S., Crisostomo, J. A., & Freitas, H. (2007).
##   Genetic Diversity of Rhizobia Associated with Acacia longifolia in Two Stages of Invasion of Coastal Sand Dunes. Applied and Environmental Microbiology, 73(15), 5066–5070.
##   doi:10.1128/aem.00613-07
## i Cite as:
##   Rodriguez-Echeverria, S., Crisostomo, J. A., & Freitas, H. (2007).
##   Genetic Diversity of Rhizobia Associated with Acacia longifolia in Two Stages of Invasion of Coastal Sand Dunes. Applied and Environmental Microbiology, 73(15), 5066–5070.
##   doi:10.1128/aem.00613-07
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Jones, M. L. M., Sowerby, A., Williams, D. L., & Jones, R. E. (2008).
##   Factors controlling soil development in sand dunes: evidence from a coastal dune soil chronosequence. Plant and Soil, 307(1-2), 219–234.
##   doi:10.1007/s11104-008-9601-9
## i Cite as:
##   Jones, M. L. M., Sowerby, A., Williams, D. L., & Jones, R. E. (2008).
##   Factors controlling soil development in sand dunes: evidence from a coastal dune soil chronosequence. Plant and Soil, 307(1-2), 219–234.
##   doi:10.1007/s11104-008-9601-9
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
##          Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
##          Extraction 2 of 2: PDF download... skipped
## i Cite as:  Kooijman, A. M., Lubbers, I., & van Til, M. (2009).
##   Iron-rich dune grasslands: Relations between soil organic matter and sorption of Fe and P. Environmental Pollution, 157(11), 3158–3165.
##   doi:10.1016/j.envpol.2009.05.022i Cite as:  Kooijman, A. M., Lubbers, I., & van Til, M. (2009).
##   Iron-rich dune grasslands: Relations between soil organic matter and sorption of Fe and P. Environmental Pollution, 157(11), 3158–3165.
##   doi:10.1016/j.envpol.2009.05.022
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Arun, A. B., & Sridhar, K. R. (2004).
##   Symbiotic performance of fast-growing rhizobia isolated from the coastal sand dune legumes of west coast of India. Biology and Fertility of Soils, 40(6), 435–439.
##   doi:10.1007/s00374-004-0800-0
## i Cite as:
##   Arun, A. B., & Sridhar, K. R. (2004).
##   Symbiotic performance of fast-growing rhizobia isolated from the coastal sand dune legumes of west coast of India. Biology and Fertility of Soils, 40(6), 435–439.
##   doi:10.1007/s00374-004-0800-0
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Hanslin, H. M., & Kollmann, J. (2016).
##   Positive responses of coastal dune plants to soil conditioning by the invasive Lupinus nootkatensis. Acta Oecologica, 77, 1–9.
##   doi:10.1016/j.actao.2016.08.007
## i Cite as:
##   Hanslin, H. M., & Kollmann, J. (2016).
##   Positive responses of coastal dune plants to soil conditioning by the invasive Lupinus nootkatensis. Acta Oecologica, 77, 1–9.
##   doi:10.1016/j.actao.2016.08.007
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: 10.1007/s12237-022-01052-2
##          Extraction 1 of 2: HTML script.... successful
##          Extraction 2 of 2: PDF download... successful
## i Cite as:  Kooijman, A. M., van Til, M., Noordijk, E., Remke, E., & Kalbitz, K. (2017). Nitrogen deposition and grass encroachment in calcareous and acidic Grey dunes (H2130) in NW-Europe. Biological Conservation, 212, 406–415. doi:10.1016/j.biocon.2016.08.009i Cite as:  Kooijman, A. M., van Til, M., Noordijk, E., Remke, E., & Kalbitz, K. (2017). Nitrogen deposition and grass encroachment in calcareous and acidic Grey dunes (H2130) in NW-Europe. Biological Conservation, 212, 406–415. doi:10.1016/j.biocon.2016.08.009
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Kim, D., & Yu, K. B. (2008).
##   A conceptual model of coastal dune ecology synthesizing spatial gradients of vegetation, soil, and geomorphology. Plant Ecology, 202(1), 135–148.
##   doi:10.1007/s11258-008-9456-4
## i Cite as:
##   Kim, D., & Yu, K. B. (2008).
##   A conceptual model of coastal dune ecology synthesizing spatial gradients of vegetation, soil, and geomorphology. Plant Ecology, 202(1), 135–148.
##   doi:10.1007/s11258-008-9456-4
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Sridhar, K. R., Arun, A. B., Narula, N., Deubel, A., & Merbach, W. (2005).
##   Patterns of Sole-Carbon-Source Utilization by Fast-Growing Coastal Sand Dune Rhizobia of the Southwest Coast of India. Engineering in Life Sciences, 5(5), 425–430.
##   doi:10.1002/elsc.200520091
## i Cite as:
##   Sridhar, K. R., Arun, A. B., Narula, N., Deubel, A., & Merbach, W. (2005).
##   Patterns of Sole-Carbon-Source Utilization by Fast-Growing Coastal Sand Dune Rhizobia of the Southwest Coast of India. Engineering in Life Sciences, 5(5), 425–430.
##   doi:10.1002/elsc.200520091
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Rodríguez-Echeverría, S. (2010).
##   Rhizobial hitchhikers from Down Under: invasional meltdown in a plant-bacteria mutualism? Journal of Biogeography.
##   doi:10.1111/j.1365-2699.2010.02284.x
## i Cite as:
##   Rodríguez-Echeverría, S. (2010).
##   Rhizobial hitchhikers from Down Under: invasional meltdown in a plant-bacteria mutualism? Journal of Biogeography.
##   doi:10.1111/j.1365-2699.2010.02284.x
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Hellmann, C., Sutter, R., Rascher, K. G., Máguas, C., Correia, O., & Werner, C. (2011).
##   Impact of an exotic N2-fixing Acacia on composition and N status of a native Mediterranean community. Acta Oecologica, 37(1), 43–50.
##   doi:10.1016/j.actao.2010.11.005
## i Cite as:
##   Hellmann, C., Sutter, R., Rascher, K. G., Máguas, C., Correia, O., & Werner, C. (2011).
##   Impact of an exotic N2-fixing Acacia on composition and N status of a native Mediterranean community. Acta Oecologica, 37(1), 43–50.
##   doi:10.1016/j.actao.2010.11.005
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
##          Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
##          Extraction 2 of 2: PDF download... skipped
## Error in open.connection(con, "rb") : Could not resolve host: downloads
## Collecting PDF from DOI: 10.1002/ldr.4078
##          Extraction 1 of 2: HTML script.... successful
##          Extraction 2 of 2: PDF download... failed, url connections too slow or unavailable
## Error in parse_url(.) : length(url) == 1 is not TRUE
## Collecting PDF from DOI: NA
##          Extraction 1 of 2: HTML script.... cannot open: HTTP status was '404 Not Found'
##          Extraction 2 of 2: PDF download... skipped
## i Cite as:  Selami, N., Auriac, M.-C., Catrice, O., Capela, D., Kaid-Harche, M., & Timmers, T. (2014).
##   Morphology and anatomy of root nodules of Retama monosperma (L.)Boiss. Plant and Soil, 379(1-2), 109–119.
##   doi:10.1007/s11104-014-2045-5i Cite as:  Selami, N., Auriac, M.-C., Catrice, O., Capela, D., Kaid-Harche, M., & Timmers, T. (2014).
##   Morphology and anatomy of root nodules of Retama monosperma (L.)Boiss. Plant and Soil, 379(1-2), 109–119.
##   doi:10.1007/s11104-014-2045-5
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Bolhuis, H., Fillinger, L., & Stal, L. J. (2013).
##   Coastal Microbial Mat Diversity along a Natural Salinity Gradient. PLoS ONE, 8(5), e63166.
##   doi:10.1371/journal.pone.0063166
## i Cite as:
##   Bolhuis, H., Fillinger, L., & Stal, L. J. (2013).
##   Coastal Microbial Mat Diversity along a Natural Salinity Gradient. PLoS ONE, 8(5), e63166.
##   doi:10.1371/journal.pone.0063166
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## Error in open.connection(con, "rb") : Could not resolve host: downloads
## Collecting PDF from DOI: 10.1007/s13199-021-00765-5
##          Extraction 1 of 2: HTML script.... successful
##          Extraction 2 of 2: PDF download... successful
## Error in open.connection(con, "rb") : Could not resolve host: downloads
## Collecting PDF from DOI: 10.1038/s41598-019-45490-8
##          Extraction 1 of 2: HTML script.... successful
##          Extraction 2 of 2: PDF download... successful
## i Cite as:  Birnbaum, C., Bissett, A., Teste, F. P., & Laliberté, E. (2018).
##   Symbiotic N2-Fixer Community Composition, but Not Diversity, Shifts in Nodules of a Single Host Legume Across a 2-Million-Year Dune Chronosequence. Microbial Ecology.
##   doi:10.1007/s00248-018-1185-1i Cite as:  Birnbaum, C., Bissett, A., Teste, F. P., & Laliberté, E. (2018).
##   Symbiotic N2-Fixer Community Composition, but Not Diversity, Shifts in Nodules of a Single Host Legume Across a 2-Million-Year Dune Chronosequence. Microbial Ecology.
##   doi:10.1007/s00248-018-1185-1
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Emery, S. M., & Rudgers, J. A. (2011).
##   Beach Restoration Efforts Influenced by Plant Variety, Soil Inoculum, and Site Effects. Journal of Coastal Research, 274, 636–644.
##   doi:10.2112/jcoastres-d-10-00120.1
## i Cite as:
##   Emery, S. M., & Rudgers, J. A. (2011).
##   Beach Restoration Efforts Influenced by Plant Variety, Soil Inoculum, and Site Effects. Journal of Coastal Research, 274, 636–644.
##   doi:10.2112/jcoastres-d-10-00120.1
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero
## i Cite as:
##   Werner, C., Zumkier, U., Beyschlag, W., & Máguas, C. (2009).
##   High competitiveness of a resource demanding invasive acacia under low resource supply. Plant Ecology, 206(1), 83–96.
##   doi:10.1007/s11258-009-9625-0
## i Cite as:
##   Werner, C., Zumkier, U., Beyschlag, W., & Máguas, C. (2009).
##   High competitiveness of a resource demanding invasive acacia under low resource supply. Plant Ecology, 206(1), 83–96.
##   doi:10.1007/s11258-009-9625-0
## Warning in rep(yes, length.out = len): 'x' is NULL so the result will be NULL
## Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
##   replacement has length zero

Resulting Folder

Output of Pdf Downloading

Part 3: Word Cloud time!

Okay, this is just cool! I just wanted to make some neat wordclouds. Feel free to skip this part. It takes all the keywords found in the search and formats them for use. I do the same with the titles from the results. One creates a static image and the other creates a dynamic tool where hovering over word tells you how many times it occured.

Code shamelssly stolen from here and it is magic! https://cran.r-project.org/web/packages/wordcloud2/vignettes/wordcloud.html

Word Clouds made from key words

#### Word cloud from key words
kw<-pubs$keywords


# Create a corpus  
docs <- Corpus(VectorSource(kw))

docs <- docs  %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
dtm <- TermDocumentMatrix(docs) 
matrix <- as.matrix(dtm) 
words <- sort(rowSums(matrix),decreasing=TRUE) 
df <- data.frame(word = names(words),freq=words)

set.seed(27)
wordcloud(words = df$word, freq = df$freq, min.freq = 1,max.words=200, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, "Dark2"))
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : mycorrhizae could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : competition could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : mediterranean could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words =
## 200, : progressive could not be fit on page. It will not be plotted.

wordcloud2(data=df, fontFamily = 'Times', color = "random-dark")

Word cloud from titles

kw2<-pubs$title
docs2 <- Corpus(VectorSource(kw2))

docs2 <- docs2  %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs2 <- tm_map(docs2, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs2, content_transformer(tolower)):
## transformation drops documents
docs2 <- tm_map(docs2, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs2, removeWords, stopwords("english")):
## transformation drops documents
dtm2 <- TermDocumentMatrix(docs2) 
matrix2 <- as.matrix(dtm2) 
words2 <- sort(rowSums(matrix2),decreasing=TRUE) 
df2 <- data.frame(word = names(words2),freq=words2)

set.seed(27)
wordcloud(words = df2$word, freq = df2$freq, min.freq = 1,max.words=200, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, "Dark2"))

wordcloud2(data=df2, fontFamily = 'Times', color = "random-dark")

Part 4: Uploading results to google sheets

This is an important final step. This will upload the metadata and all the pdfs (as a zipped folder) to your google drive automatically. It will prompt you to sign in. I reccomend using your UC Davis google drive, but it does not totally matter.

#### RUN THIS IF THE FIRST TIME ####

#gs4_auth() #signs you in so it can upload to google sheets

#### This gives you the ability to alter sheets on your google drive ###

#gs4_create("SearchResultsDataTest", sheets = list(data = pubs)) #uploads list of publications and all the info to a google sheets

#zip(zipfile= "resultsTest.zip", files = here("SearchResultsTest")) #zips your pdfs

###this will ask you to sign in again, which is annoying, but deal with it.
#drive_upload(media = "resultsTest.zip") #uploads a zipped file to your google drive. 

Example Output

Then you can get started on reading and extracting! My personal set up for my Meta Analysis

That’s it!

I hope this worked for you! Let me know if you run into any troubles or have any ways for me to make it better. Thank you so much for giving it a look. Let me know if you actually end up using this and send me your word clouds!

@NotBenRivera on Twitter or email me at