Text Mining aplicado em estudos sobre avaliação de impacto de políticas públicas agrícolas
Desde o início da década de 1960, quando foram iniciados, os estudos de avaliação de impacto tornaram-se um tema recorrente na literatura sobre política científica e tecnológica, atraindo interesse de pesquisadores e de agentes ligados ao contexto da inovação. Acompanhando os processos de produção da ciência e tecnologia, esses estudos foram sendo abordados em dimensões multivariadas tendo em vista, principalmente, os impactos sociais, econômicos, ambientais e tecnológicos associados a diferentes áreas. Neste sentido, considerando que o Brasil passou a realizar esses estudos nos anos 1980 e que há muitos trabalhos publicados em bases de dados nos últimos 40 anos, especificamente para a avaliação de impactos de políticas públicas agrícolas, os objetivos desta pesquisa são: (1) identificar os termos e as metodologias mais empregados nestes estudos e as (2) similaridades entre eles, para fortalecer o desenvolvimento das atividades relativas à investigação de impactos conduzidas por uma equipe da Embrapa. Como metodologia, adotou-se o mapeamento sistemático, aplicando-se a técnica de text mining para tokenização e modelagem de tópicos, em um conjunto de dados textuais estruturados, obtidos a partir da base de dados Scopus. Como resultado, obteve-se um conjunto de tokens, representados por Ngrams, Bigrams e Trigrams que possibilitaram a identificação dos principais assuntos tratados nos trabalhos. Da mesma forma, foram levantadas cerca de 90 metodologias distintas. Para a verificação de similaridade, obteve-se um total de 30 grupos (k), os quais foram organizados em um painel para facilitar a visualização e interpretação desses resultados.
Since the beginning of the 1960s, when they were started, impact assessment studies have become a recurring theme in the literature on scientific and technological policy, attracting the interest of researchers and agents linked to the context of innovation. Following the production processes of science and technology, these studies were approached in multivariate dimensions, mainly in view of the social, economic, environmental and technological impacts associated with different areas. In this sense, considering that Brazil started to carry out these studies in the 1980s and that there are many works published in databases in the last 40 years, specifically for the evaluation of impacts of public agricultural policies, the objectives of this research are: (1) identify the terms and methodologies most used in these studies and the (2) similarities between them, to strengthen the development of activities related to the investigation of impacts conducted by an Embrapa team. As a methodology, systematic mapping was adopted, applying the text mining technique for topic tokenization and modeling, in a set of structured textual data, obtained from the Scopus database. As a result, a set of tokens was obtained, represented by Ngrams, Bigrams and Trigrams that made it possible to identify the main subjects dealt with in the works. Likewise, around 90 different methodologies were identified. For the verification of similarity, a total of 30 groups (k) were obtained, which were organized in a panel to facilitate the visualization and interpretation of these results.
Analisar um corpus representativo de estudos sobre a avaliação de impacto de políticas públicas agrícolas, publicados nos últimos 40 anos e armazenados em base de dados internacional, para identificar temas, metodologias e similaridades entre os trabalhos.
pacotes <- c("XML", "readxl", "topicmodels", "caret", "tidyr", "ggplot2", "quanteda", "pdftools","stringr","NLP","curl", "tidytext", "wordcloud", "dplyr", "SnowballC", "stopwords", "pdftools", "tm", "RColorBrewer", "magrittr", "knitr")
if(sum(as.numeric(!pacotes %in% installed.packages())) != 0){
instalador <- pacotes[!pacotes %in% installed.packages()]
for(i in 1:length(instalador)) {
install.packages(instalador, dependencies = T)
break()}
sapply(pacotes, require, character = T)
} else {
sapply(pacotes, require, character = T)
}
## Carregando pacotes exigidos: XML
## Carregando pacotes exigidos: readxl
## Carregando pacotes exigidos: topicmodels
## Carregando pacotes exigidos: caret
## Carregando pacotes exigidos: ggplot2
## Carregando pacotes exigidos: lattice
## Carregando pacotes exigidos: tidyr
## Carregando pacotes exigidos: quanteda
## Package version: 3.2.1
## Unicode version: 14.0
## ICU version: 70.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
## Carregando pacotes exigidos: pdftools
## Using poppler version 22.04.0
## Carregando pacotes exigidos: stringr
## Carregando pacotes exigidos: NLP
##
## Attaching package: 'NLP'
## The following objects are masked from 'package:quanteda':
##
## meta, meta<-
## The following object is masked from 'package:ggplot2':
##
## annotate
## Carregando pacotes exigidos: curl
## Using libcurl 7.64.1 with LibreSSL/2.8.3
## Carregando pacotes exigidos: tidytext
## Carregando pacotes exigidos: wordcloud
## Carregando pacotes exigidos: RColorBrewer
## Carregando pacotes exigidos: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Carregando pacotes exigidos: SnowballC
## Carregando pacotes exigidos: stopwords
## Carregando pacotes exigidos: tm
##
## Attaching package: 'tm'
## The following object is masked from 'package:stopwords':
##
## stopwords
## The following object is masked from 'package:quanteda':
##
## stopwords
## Carregando pacotes exigidos: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
##
## extract
## Carregando pacotes exigidos: knitr
## XML readxl topicmodels caret tidyr ggplot2
## TRUE TRUE TRUE TRUE TRUE TRUE
## quanteda pdftools stringr NLP curl tidytext
## TRUE TRUE TRUE TRUE TRUE TRUE
## wordcloud dplyr SnowballC stopwords pdftools tm
## TRUE TRUE TRUE TRUE TRUE TRUE
## RColorBrewer magrittr knitr
## TRUE TRUE TRUE
base_mba <- readxl::read_excel(path = "Scopus _ Base com registros para análise MBA(rotulada).xlsx")
base_mba <- data.frame(base_mba)
#Transformando em lowercase
base_mba$Abstract <- tolower(base_mba$Abstract)
base_mba_remove <- grep("health*", base_mba$Abstract, invert = TRUE)
base_mba_remove_title <- grep("health*", base_mba$Title)
base_mba_remove_title
## [1] 33 88 148 187 207 209 231 241 249 272 311 357 385 389 431
## [16] 516 518 582 629 676 772 777 818 854 1032 1058 1059 1067 1078 1115
## [31] 1132 1166 1171 1217 1236 1248 1281 1282 1304 1306 1314 1329 1342 1344 1345
## [46] 1347 1367 1368 1372 1374 1376 1381 1382 1395 1397 1398 1399 1412 1422 1435
## [61] 1446 1447 1449 1450 1451 1458 1459 1479 1480 1481 1485 1486 1495 1496 1498
## [76] 1499 1500 1582 1620 1634 1651 1669 1687 1701 1753 1755 1763 1775 1782 1791
## [91] 1796 1831 1837 1839 1897 1948 1972 1974 2002 2085 2098 2133 2171 2214 2268
## [106] 2296 2303 2315 2323 2332
base_mba <- base_mba[c(base_mba_remove), ]
base_mba %>% filter(Year < 2021) %>%
ggplot(aes(x = Year))+
geom_bar(show.legend = TRUE) +
labs(title = "Avaliação de Impactos relacionadas a agricultura e políticas públicas",
subtitle = "Quantidade de Trabalhos por Ano",
caption = "Gráfico do quantitativo de trabalhos analisados",
x = "Ano",
y = "Quantidade")
base_mba <- base_mba %>%
mutate(Abstract = gsub(pattern = "\\d",
replacement = "",
x = Abstract)) %>%
mutate(Abstract = gsub(pattern = "%|,|;|\\?|\\!|\\-|\\.|\\:|\\(|\\)|~",
replacement = "",
x = Abstract))
# Stopword
stopword_en <- c(stopwords("en"), "springer", "uk", "no", "abstract", "available", "taylor", "francis", "group", "ltd", "rights", "reserved", "this", "we", "old", "one", "an", "on", "of", "the", "in", "is", "of", "for the", "to the", "of the", "in the", "of a", "in this", "of this", "on the", "et", "al", "elsevier", "all","rights", "reserved")
# %in% - função/atalho para cruzar verdadeiro e falso
c(1:10)[!c(1:10) %in% c(3,4)]
## [1] 1 2 5 6 7 8 9 10
remove_elements <- function(x, lixo){
return(x[! x %in% lixo])
}
listaa = lapply(X = base_mba$Abstract,
FUN = function(x) {
strsplit(x,
split = ' ')})
lista2 <- lapply(X = listaa,
FUN = function(elemento_de_lista){
remove_elements(x = elemento_de_lista[[1]],
lixo = stopword_en)
})
paste(lista2[[1]], collapse = " ")
## [1] "paper provides ex ante assessment effects income stabilization tool ist new risk management tool proposed common agricultural policy european union investigate effects ist income variability levels well income inequality farming population take italian agriculture example introduction ist currently discussion rich panel farms studied period years use stochastic simulation derive different income inequality estimates apply gini decomposition approaches assess distributional implications ist compare current income situation resulting hypothetical implementation ist different policy scenarios also accounting reduced levels cap direct payments find ist stabilizes farm income also enhances level reduces income inequality italian agriculture ist effective reducing income inequality farmers pay contributions mutual funds proportional income compared case flat rate contributions finally results support hypothesis impact ist will differ level direct payments reduced thus results seem robust enough accommodate future policy conditions © authors"
lista3 <- lapply(X = lista2,
FUN = function(x){
paste(x, collapse = " ")
})
# Transformação em vetor
base_mba$Abstract <- unlist(lista3)
# Tokenização | Separando em N-grams
base_mba_tokens_1 <- base_mba %>%
unnest_tokens(output = palavra_resumo,
input = Abstract,
token = "ngrams",
n = 1 )
# Separando em N-gram de 2
base_mba_tokens_2 <- base_mba %>%
unnest_tokens(output = palavra_resumo,
input = Abstract,
token = "ngrams",
n = 2)
# Separando em N-gram de 3
base_mba_tokens_3 <- base_mba %>%
unnest_tokens(output = palavra_resumo,
input = Abstract,
token = "ngrams",
n = 3)
NFILTER <- 3
contagem_one_gramm <- base_mba_tokens_1 %>%
# filter(str_detect(string = palavra_resumo,
# pattern = "(model)|(method)|(fuzzy)|(interview)|(survey)|(payback)|(rif)|(siampi)|(asirpa)|(ambitec)")) %>%
count(palavra_resumo,
sort = TRUE) %>%
filter(n>=NFILTER)
contagem_two_gramm <- base_mba_tokens_2 %>%
# filter(str_detect(string = palavra_resumo,
# pattern = "(model)|(method)|(fuzzy)|(interview)|(survey)|(payback)|(rif)|(siampi)|(asirpa)|(ambitec)")) %>%
count(palavra_resumo,
sort = TRUE) %>%
filter(n>=NFILTER)
contagem_three_gramm <- base_mba_tokens_3 %>%
#filter(str_detect(string = palavra_resumo,
# pattern = "(model)|(method)|(fuzzy)|(interview)|(survey)|(payback)|(rif)|(siampi)|(asirpa)|(ambitec)")) %>%
count(palavra_resumo,
sort = TRUE) %>%
filter(n>=NFILTER)
#Gráfico de Ngrams
ngram <- read_excel(path = "ngram.xlsx")
ngram <- as.data.frame(ngram)
ngram %>% filter(n > 600) %>%
ggplot(mapping = aes(x = n, y = reorder(palavra_resumo, n), fill= n))+
geom_col(show.legend = FALSE)+
labs(title = "Palavras | Ngram",
caption = "Gráfico do quantitativo de palavras de NGRAMS > que 600",
x = "Quantidade",
y = "Palavras - Ngram")
#Gráfico de Bigrams
bigram <- read_excel(path = "bigram.xlsx")
bigram <- as.data.frame(x = bigram)
bigram %>% filter(n > 100) %>%
ggplot(mapping = aes(x = n, y = reorder(palavra_resumo, n), fill= n))+
geom_col(show.legend = FALSE)+
labs(title = "Palavras | Bigram",
caption = "Gráfico do quantitativo de palavras de NGRAMS = 2 > que 100",
x = "Quantidade",
y = "Palavras - Bigram")
#Gráfico de Trigrams
trigram <- read_excel(path = "trigram.xlsx")
trigram <- as.data.frame(trigram)
trigram %>% filter(n >= 20) %>%
ggplot(mapping = aes(x = n, y = reorder(palavra_resumo, n), fill = n))+
geom_col(show.legend = FALSE)+
labs(title = "Palavras | Trigram",
caption = "Gráfico do quantitativo de palavras de NGRAMS = 3 > que 20",
x = "Quantidade",
y = "Palavras - Trigram")
set.seed(1234)
## Wordcloud | Ngram
wordcloud(words = ngram$palavra_resumo,freq = ngram$n, min.freq = 100, random.order = TRUE, rot.per=0.35, colors = brewer.pal(8, "Dark2"),
max.words = 200)
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : environmental could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : agriculture could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : countries could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : implementation could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : application could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : practices could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : measures could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : ante could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : using could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : compared could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : limited could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : sustainability could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : impact could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : among could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : quality could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : studies could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : support could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : study could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : economic could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : indicators could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : knowledge could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : efficiency could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : implications could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : emissions could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : growth could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : yield could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : changes could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : assessment could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : expost could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : technology could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : food could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : article could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : productivity could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : management could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : evaluation could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : benefits could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : model could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : significant could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : farms could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : conditions could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : developed could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : large could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : developing could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : income could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : various could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : agricultural could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : increased could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : potential could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : decision could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : levels could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : important could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : within could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : show could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : findings could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : outcomes could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : level could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : regional could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : market could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : years could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : government could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : methods could not be fit on page. It will not be plotted.
## Wordcloud | Bigram
wordcloud(words = bigram$palavra_resumo,freq = bigram$n, min.freq = 20, random.order = TRUE, colors = brewer.pal(8, "Dark2"),
max.words = 100)
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : technology assessment could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : agricultural production could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : public policy could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : impact evaluation could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : environmental impact could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : climate change could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : ex post could not be fit on page. It will not be plotted.
## Wordcloud | Trigram
wordcloud(words = trigram$palavra_resumo,
freq = trigram$n, min.freq = 1,
random.order = TRUE,
rot.per=0.53,
colors = brewer.pal(8, "Dark2"),
max.words = 150)
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : common agricultural policy could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : genetically modified gm could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : world scientific publishing could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : canadian agricultural economics could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : research development r could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : capital farmland construction could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : propensity score matching could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : fundamental public policy could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : environmental impact assessment could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : sustainability impact assessment could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural land use could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : design methodology approach could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : technology assessment ta could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : rural development measures could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : distribution reproduction medium could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : climate change impacts could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : economic environmental social could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : purpose purpose paper could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : environmental economic social could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : using propensity score could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : european review agricultural could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farmers subsaharan africa could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : standard system agricultural could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ictbased market information could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : attribution license permits could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : use distribution reproduction could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : resource economics society could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : greenhouse gas emissions could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : endogenous switching regression could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : internal rate return could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural applied economics could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : wellfacilitied capital farmland could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : society plant pathology could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : score matching psm could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural sustainable intensification could not be fit on page. It
## will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : smallholder farming systems could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex post impact could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : quality xiaozhan rice could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : modified gm crops could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : positive mathematical programming could not be fit on page. It will not
## be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural policy cap could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : applied economics association could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : computable general equilibrium could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : food nutrition security could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : nutritional quality xiaozhan could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : circular economy china could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural resource economics could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : oil palm plantations could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farmer field school could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : seasonal climate forecasts could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ante impact assessment could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : investment agricultural research could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : technology assessment ota could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : impacts agricultural research could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : impacts climate change could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : economic surplus model could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : conservation agriculture ca could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : life cycle assessment could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : research design used could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : international society plant could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : natural resource management could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : research limitations implications could not be fit on page. It will not
## be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural production systems could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : economics society inc could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : positive significant effect could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : integrated pest management could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : medium provided original could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : environmental social impacts could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : participatory action research could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : reproduction medium provided could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex post facto could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : development r d could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural research development could not be fit on page. It will not
## be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : exante impact assessment could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural research extension could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : illegal farmland conversion could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex ante economic could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : public policy implications could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farmer field schools could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : innovative cropping systems could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : adaptation climate change could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : climate change mitigation could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : originality value paper could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : soil organic matter could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : cycle assessment lca could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : system agricultural circular could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : water use efficiency could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex ante analysis could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : soil water conservation could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : mathematical programming pmp could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : enhance food security could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex post analysis could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : review agricultural economics could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : gas ghg emissions could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : western agricultural economics could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : access ictbased market could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : nature switzerland ag could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : nova science publishers could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : office technology assessment could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : permits unrestricted use could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural research cgiar could not be fit on page. It will not be
## plotted.
## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farming systems research could not be fit on page. It will not be
## plotted.
#Gráfico de Metodologias Abstract
metodologia <- read_excel(path = "metodologias_trigram.xlsx")
metodologia <- as.data.frame(x = metodologia)
metodologia %>% filter(qtd >= 6) %>%
ggplot(mapping = aes(x = qtd, y = reorder(metodologia, qtd), fill=qtd))+
geom_col(show.legend = FALSE)+
labs(title = "Metodologias Identificadas - Abstract",
caption = "Gráfico do quantitativo de metodologias com ocorrência > que 6",
x = "Quantidade",
y = "Metodologia")
#Gráfico de Metodologias | Keyword author
mtd_key_au <- read_excel(path = "metodologias_keyword_au.xlsx")
mtd_key_au <- as.data.frame(x = mtd_key_au)
mtd_key_au %>% filter(qtd >= 2) %>%
ggplot(mapping = aes(x = qtd, y = reorder(metodologia, qtd), fill=qtd))+
geom_col(show.legend = FALSE)+
labs(title = "Metodologias Identificadas - Keyword Author",
caption = "Gráfico do quantitativo de metodologias com ocorrência > que 3",
x = "Quantidade",
y = "Metodologia")
# Criando o Corpus
corpus <- Corpus(VectorSource(base_mba$Abstract))
# Criando a matrix
JSS_dtm <- DocumentTermMatrix(corpus,
control = list(stemming = TRUE, stopwords = TRUE, minWordLength = 3,
removeNumbers = TRUE, removePunctuation = TRUE))
dim(JSS_dtm)
## [1] 2040 12841
nrow(JSS_dtm)
## [1] 2040
# Frequência de termos
term_tfidf <-
tapply(JSS_dtm$v/slam::row_sums(JSS_dtm)[JSS_dtm$i], JSS_dtm$j, mean) *
log2(nDocs(JSS_dtm)/slam::col_sums(JSS_dtm > 0))
SS_dtm <- JSS_dtm[, term_tfidf >= 0.1]
JSS_dtm <- JSS_dtm[slam::row_sums(JSS_dtm) > 0,]
summary(slam::col_sums(JSS_dtm))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 2.00 18.76 6.00 2689.00
summary(term_tfidf)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01264 0.06074 0.08075 0.10401 0.11822 2.07581
# Criação de agrupamentos e salvando nova variável "VEM"
k <- 30
SEED <- 2010
jss_TM <- list(
VEM = LDA(JSS_dtm, k = k, control = list(seed = SEED)))
Topic <- topics(jss_TM[["VEM"]], 1) #agrupamento dos artigos
table(Topic)
## Topic
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 54 52 58 34 46 43 82 108 176 65 144 48 71 42 64 71 73 49 101 54
## 21 22 23 24 25 26 27 28 29 30
## 46 43 43 48 77 124 53 69 47 55
Terms <- terms(x = jss_TM[["VEM"]], 5) #Os cinco termos mais frequentes para cada tópico são obtidos por
Terms[,1:30]
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6
## [1,] "crop" "land" "crop" "farmer" "assess" "law"
## [2,] "system" "area" "yield" "use" "regul" "public"
## [3,] "sustain" "flood" "soil" "prefer" "research" "industri"
## [4,] "use" "use" "use" "organ" "regulatori" "polici"
## [5,] "pesticid" "territori" "water" "nitrogen" "impact" "use"
## Topic 7 Topic 8 Topic 9 Topic 10 Topic 11 Topic 12
## [1,] "farm" "polici" "food" "farmer" "impact" "climat"
## [2,] "model" "public" "agricultur" "technolog" "evalu" "adapt"
## [3,] "agricultur" "polit" "product" "product" "research" "chang"
## [4,] "use" "social" "avail" "agricultur" "develop" "strategi"
## [5,] "polici" "govern" "secur" "adopt" "agricultur" "agricultur"
## Topic 13 Topic 14 Topic 15 Topic 16 Topic 17 Topic 18 Topic 19
## [1,] "research" "energi" "forest" "technolog" "impact" "cost" "model"
## [2,] "invest" "plant" "conserv" "evalu" "polici" "farm" "use"
## [3,] "innov" "biomass" "land" "assess" "measur" "use" "assess"
## [4,] "develop" "use" "ecosystem" "polici" "effect" "result" "system"
## [5,] "return" "product" "servic" "ethic" "region" "farmer" "develop"
## Topic 20 Topic 21 Topic 22 Topic 23 Topic 24 Topic 25
## [1,] "water" "product" "women" "energi" "countri" "market"
## [2,] "irrig" "impact" "studi" "emiss" "develop" "trade"
## [3,] "use" "use" "use" "pollut" "econom" "price"
## [4,] "econom" "environment" "gender" "polici" "polici" "contract"
## [5,] "univers" "drought" "differ" "china" "use" "model"
## Topic 26 Topic 27 Topic 28 Topic 29 Topic 30
## [1,] "household" "farmer" "risk" "environment" "program"
## [2,] "agricultur" "climat" "insur" "impact" "agricultur"
## [3,] "adopt" "crop" "loss" "oil" "use"
## [4,] "incom" "forecast" "manag" "communiti" "estim"
## [5,] "impact" "season" "disast" "palm" "school"
length(Topic)
## [1] 2040
base_mba <- base_mba %>%
mutate(VEM = topics(jss_TM[["VEM"]], 1))
base_mba <- data.frame(base_mba)
base_graf <- readxl::read_xlsx("base_atual.xlsx") #lendo o dataframe
base_df <- base_graf[, c("Source.title", "VEM")] #reduzindo o dataframe para só as colunas de interesse
df <- table(base_df) #transformando em uma tabela de counts
df <- as.data.frame.matrix(df) #transformando a tabela de counts em dataframe
df_sum <- apply(df, 1, sum) #criando um vetor que soma a quantidade de tópicos de cada revista
df_sum <- order(df_sum, decreasing = TRUE) #ordenando o vetor pela quantidade de topicos
df_sorted <- df[df_sum,] #ordenando o dataframe pela soma dos tópicos em cada revista
df_sorted <- data.matrix(df_sorted) #transformando o df em uma matriz numérica
heatmap(df_sorted[1:30, ], cexRow=0.8) #imprimindo o heatmap com os 30 jornais com mais tópicos
#O heatmap tem funções de clustering, portanto ele muda a ordem dos tópicos e revistas
#de acordo com os grupos, e plota o dendrograma para mostrar como dividiu os grupos
contagem_revistas <- base_graf %>%
count(Source.title,
sort = TRUE)
View(contagem_revistas)
contagem_revistas %>% filter(n >= 20) %>%
ggplot(mapping = aes(x = n, y = reorder(Source.title, n), fill=n))+
geom_col(show.legend = FALSE)+
labs(title = "Revistas com maior número de artigos",
caption = "Revistas com maior número de artigos > que 20 artigos",
x = "Quantidade",
y = "Revistas")
base_graf %>%
ggplot(mapping = aes(x = Year, y = Document.Type))+
geom_boxplot()+
labs(title = "Tipologia das publicações",
caption = "Tipos de publicações no período de 1980 a 2020",
x = "Ano",
y = "Tipos de Documentos")
GRUN, B.; HORNIK, K. Topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software, v. 40, n. 13, p. 1–30, 2011. DOIi:10.18637/jss.v040.i13.
SILGE, J.; ROBINSON, D. Text Mining with R: A Tidy Approach. O’Reilly Media, 2017.
kableExtra: Por Hao Zhu