1 Contexto do Trabalho

1.1 Título

Text Mining aplicado em estudos sobre avaliação de impacto de políticas públicas agrícolas

1.2 Resumo

Desde o início da década de 1960, quando foram iniciados, os estudos de avaliação de impacto tornaram-se um tema recorrente na literatura sobre política científica e tecnológica, atraindo interesse de pesquisadores e de agentes ligados ao contexto da inovação. Acompanhando os processos de produção da ciência e tecnologia, esses estudos foram sendo abordados em dimensões multivariadas tendo em vista, principalmente, os impactos sociais, econômicos, ambientais e tecnológicos associados a diferentes áreas. Neste sentido, considerando que o Brasil passou a realizar esses estudos nos anos 1980 e que há muitos trabalhos publicados em bases de dados nos últimos 40 anos, especificamente para a avaliação de impactos de políticas públicas agrícolas, os objetivos desta pesquisa são: (1) identificar os termos e as metodologias mais empregados nestes estudos e as (2) similaridades entre eles, para fortalecer o desenvolvimento das atividades relativas à investigação de impactos conduzidas por uma equipe da Embrapa. Como metodologia, adotou-se o mapeamento sistemático, aplicando-se a técnica de text mining para tokenização e modelagem de tópicos, em um conjunto de dados textuais estruturados, obtidos a partir da base de dados Scopus. Como resultado, obteve-se um conjunto de tokens, representados por Ngrams, Bigrams e Trigrams que possibilitaram a identificação dos principais assuntos tratados nos trabalhos. Da mesma forma, foram levantadas cerca de 90 metodologias distintas. Para a verificação de similaridade, obteve-se um total de 30 grupos (k), os quais foram organizados em um painel para facilitar a visualização e interpretação desses resultados.

1.3 Abstract

Since the beginning of the 1960s, when they were started, impact assessment studies have become a recurring theme in the literature on scientific and technological policy, attracting the interest of researchers and agents linked to the context of innovation. Following the production processes of science and technology, these studies were approached in multivariate dimensions, mainly in view of the social, economic, environmental and technological impacts associated with different areas. In this sense, considering that Brazil started to carry out these studies in the 1980s and that there are many works published in databases in the last 40 years, specifically for the evaluation of impacts of public agricultural policies, the objectives of this research are: (1) identify the terms and methodologies most used in these studies and the (2) similarities between them, to strengthen the development of activities related to the investigation of impacts conducted by an Embrapa team. As a methodology, systematic mapping was adopted, applying the text mining technique for topic tokenization and modeling, in a set of structured textual data, obtained from the Scopus database. As a result, a set of tokens was obtained, represented by Ngrams, Bigrams and Trigrams that made it possible to identify the main subjects dealt with in the works. Likewise, around 90 different methodologies were identified. For the verification of similarity, a total of 30 groups (k) were obtained, which were organized in a panel to facilitate the visualization and interpretation of these results.

1.4 Objetivo Geral

Analisar um corpus representativo de estudos sobre a avaliação de impacto de políticas públicas agrícolas, publicados nos últimos 40 anos e armazenados em base de dados internacional, para identificar temas, metodologias e similaridades entre os trabalhos.

2 Carregamento dos Pacotes

pacotes <- c("XML", "readxl", "topicmodels", "caret", "tidyr", "ggplot2", "quanteda", "pdftools","stringr","NLP","curl", "tidytext", "wordcloud", "dplyr", "SnowballC", "stopwords", "pdftools", "tm", "RColorBrewer", "magrittr", "knitr")

if(sum(as.numeric(!pacotes %in% installed.packages())) != 0){
  instalador <- pacotes[!pacotes %in% installed.packages()]
  for(i in 1:length(instalador)) {
    install.packages(instalador, dependencies = T)
    break()}
  sapply(pacotes, require, character = T) 
} else {
  sapply(pacotes, require, character = T) 
}

## Carregando pacotes exigidos: XML

## Carregando pacotes exigidos: readxl

## Carregando pacotes exigidos: topicmodels

## Carregando pacotes exigidos: caret

## Carregando pacotes exigidos: ggplot2

## Carregando pacotes exigidos: lattice

## Carregando pacotes exigidos: tidyr

## Carregando pacotes exigidos: quanteda

## Package version: 3.2.1
## Unicode version: 14.0
## ICU version: 70.1

## Parallel computing: 8 of 8 threads used.

## See https://quanteda.io for tutorials and examples.

## Carregando pacotes exigidos: pdftools

## Using poppler version 22.04.0

## Carregando pacotes exigidos: stringr

## Carregando pacotes exigidos: NLP

## 
## Attaching package: 'NLP'

## The following objects are masked from 'package:quanteda':
## 
##     meta, meta<-

## The following object is masked from 'package:ggplot2':
## 
##     annotate

## Carregando pacotes exigidos: curl

## Using libcurl 7.64.1 with LibreSSL/2.8.3

## Carregando pacotes exigidos: tidytext

## Carregando pacotes exigidos: wordcloud

## Carregando pacotes exigidos: RColorBrewer

## Carregando pacotes exigidos: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Carregando pacotes exigidos: SnowballC

## Carregando pacotes exigidos: stopwords

## Carregando pacotes exigidos: tm

## 
## Attaching package: 'tm'

## The following object is masked from 'package:stopwords':
## 
##     stopwords

## The following object is masked from 'package:quanteda':
## 
##     stopwords

## Carregando pacotes exigidos: magrittr

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:tidyr':
## 
##     extract

## Carregando pacotes exigidos: knitr

##          XML       readxl  topicmodels        caret        tidyr      ggplot2 
##         TRUE         TRUE         TRUE         TRUE         TRUE         TRUE 
##     quanteda     pdftools      stringr          NLP         curl     tidytext 
##         TRUE         TRUE         TRUE         TRUE         TRUE         TRUE 
##    wordcloud        dplyr    SnowballC    stopwords     pdftools           tm 
##         TRUE         TRUE         TRUE         TRUE         TRUE         TRUE 
## RColorBrewer     magrittr        knitr 
##         TRUE         TRUE         TRUE

3 Carregando o Arquivo

base_mba <- readxl::read_excel(path = "Scopus _ Base com registros para análise MBA(rotulada).xlsx")
base_mba <- data.frame(base_mba) 

#Transformando em lowercase
base_mba$Abstract <- tolower(base_mba$Abstract)

3.1 Extração da base

base_mba_remove <- grep("health*", base_mba$Abstract, invert = TRUE)
base_mba_remove_title <- grep("health*", base_mba$Title)
base_mba_remove_title

##   [1]   33   88  148  187  207  209  231  241  249  272  311  357  385  389  431
##  [16]  516  518  582  629  676  772  777  818  854 1032 1058 1059 1067 1078 1115
##  [31] 1132 1166 1171 1217 1236 1248 1281 1282 1304 1306 1314 1329 1342 1344 1345
##  [46] 1347 1367 1368 1372 1374 1376 1381 1382 1395 1397 1398 1399 1412 1422 1435
##  [61] 1446 1447 1449 1450 1451 1458 1459 1479 1480 1481 1485 1486 1495 1496 1498
##  [76] 1499 1500 1582 1620 1634 1651 1669 1687 1701 1753 1755 1763 1775 1782 1791
##  [91] 1796 1831 1837 1839 1897 1948 1972 1974 2002 2085 2098 2133 2171 2214 2268
## [106] 2296 2303 2315 2323 2332

base_mba <- base_mba[c(base_mba_remove), ]

3.2 Gráfico de trabalhos por Ano

base_mba %>% filter(Year < 2021) %>%  
ggplot(aes(x = Year))+
  geom_bar(show.legend = TRUE) +
  labs(title = "Avaliação de Impactos relacionadas a agricultura e políticas públicas",
       subtitle = "Quantidade de Trabalhos por Ano",
       caption = "Gráfico do quantitativo de trabalhos analisados",
       x = "Ano",
       y = "Quantidade")

3.3 Limpeza da base: regex, stopwords

base_mba <- base_mba %>% 
  mutate(Abstract = gsub(pattern = "\\d",
                         replacement = "",
                         x = Abstract)) %>% 
  mutate(Abstract = gsub(pattern = "%|,|;|\\?|\\!|\\-|\\.|\\:|\\(|\\)|~",
                         replacement = "",
                         x = Abstract))
# Stopword
stopword_en <- c(stopwords("en"), "springer", "uk", "no", "abstract", "available", "taylor", "francis", "group", "ltd", "rights", "reserved", "this", "we", "old", "one", "an", "on", "of", "the", "in", "is", "of", "for the", "to the", "of the", "in the", "of a", "in this", "of this", "on the", "et", "al", "elsevier", "all","rights", "reserved")

3.3.1 Limpeza - Cruzamento

# %in% - função/atalho para cruzar verdadeiro e falso
c(1:10)[!c(1:10) %in% c(3,4)]

## [1]  1  2  5  6  7  8  9 10

remove_elements <- function(x, lixo){
  return(x[! x %in% lixo])
}
listaa = lapply(X = base_mba$Abstract,
           FUN = function(x) {
             strsplit(x,
                      split = ' ')})

lista2 <- lapply(X = listaa,
                 FUN = function(elemento_de_lista){
                   remove_elements(x = elemento_de_lista[[1]],
                                   lixo = stopword_en)
                 })

paste(lista2[[1]], collapse = " ")

## [1] "paper provides ex ante assessment effects income stabilization tool ist new risk management tool proposed common agricultural policy european union investigate effects ist income variability levels well income inequality farming population take italian agriculture example introduction ist currently discussion rich panel  farms studied period  years use stochastic simulation derive different income inequality estimates apply gini decomposition approaches assess distributional implications ist compare current income situation resulting hypothetical implementation ist different policy scenarios also accounting reduced levels cap direct payments find ist stabilizes farm income also enhances level reduces income inequality italian agriculture ist effective reducing income inequality farmers pay contributions mutual funds proportional income compared case flat rate contributions finally results support hypothesis impact ist will differ level direct payments reduced thus results seem robust enough accommodate future policy conditions ©  authors"

lista3 <- lapply(X = lista2,
                 FUN = function(x){ 
                   paste(x, collapse = " ")
                 })

4 Temas (Tokens)

4.1 Tokenização

# Transformação em vetor
base_mba$Abstract <- unlist(lista3)

# Tokenização | Separando em N-grams 
base_mba_tokens_1 <- base_mba %>%
  unnest_tokens(output = palavra_resumo,
                input = Abstract,
                token = "ngrams",
                n = 1 )


# Separando em N-gram de 2
base_mba_tokens_2 <- base_mba %>%
  unnest_tokens(output = palavra_resumo,
                input = Abstract,
                token = "ngrams",
                n = 2)

# Separando em N-gram de 3
base_mba_tokens_3 <- base_mba %>%
  unnest_tokens(output = palavra_resumo,
                input = Abstract,
                token = "ngrams",
                n = 3)

NFILTER <- 3

contagem_one_gramm <- base_mba_tokens_1 %>% 
  # filter(str_detect(string = palavra_resumo,
  #                   pattern = "(model)|(method)|(fuzzy)|(interview)|(survey)|(payback)|(rif)|(siampi)|(asirpa)|(ambitec)")) %>% 
  count(palavra_resumo,
        sort = TRUE) %>% 
  filter(n>=NFILTER)

contagem_two_gramm <- base_mba_tokens_2 %>%
  # filter(str_detect(string = palavra_resumo,
  #                   pattern = "(model)|(method)|(fuzzy)|(interview)|(survey)|(payback)|(rif)|(siampi)|(asirpa)|(ambitec)")) %>%
  count(palavra_resumo,
        sort = TRUE) %>%
  filter(n>=NFILTER)

contagem_three_gramm <- base_mba_tokens_3 %>%
  #filter(str_detect(string = palavra_resumo,
                    # pattern = "(model)|(method)|(fuzzy)|(interview)|(survey)|(payback)|(rif)|(siampi)|(asirpa)|(ambitec)")) %>%
  count(palavra_resumo,
        sort = TRUE) %>%
  filter(n>=NFILTER)

4.2 Visualização de Temas (Tokens)

#Gráfico de Ngrams
ngram <- read_excel(path = "ngram.xlsx")  
ngram <- as.data.frame(ngram)

ngram %>% filter(n > 600) %>% 
  ggplot(mapping = aes(x = n, y = reorder(palavra_resumo, n), fill= n))+
  geom_col(show.legend = FALSE)+
  labs(title = "Palavras | Ngram",
       caption = "Gráfico do quantitativo de palavras de NGRAMS > que 600",
       x = "Quantidade",
       y = "Palavras - Ngram")

#Gráfico de Bigrams
bigram <- read_excel(path = "bigram.xlsx")
bigram <- as.data.frame(x = bigram)

bigram %>% filter(n > 100) %>% 
  ggplot(mapping = aes(x = n, y = reorder(palavra_resumo, n), fill= n))+
  geom_col(show.legend = FALSE)+
  labs(title = "Palavras | Bigram",
       caption = "Gráfico do quantitativo de palavras de NGRAMS = 2 > que 100",
       x = "Quantidade",
       y = "Palavras - Bigram")

#Gráfico de Trigrams
trigram <- read_excel(path = "trigram.xlsx")
trigram <- as.data.frame(trigram)

trigram %>% filter(n >= 20) %>% 
  ggplot(mapping = aes(x = n, y = reorder(palavra_resumo, n), fill = n))+
  geom_col(show.legend = FALSE)+
  labs(title = "Palavras | Trigram",
       caption = "Gráfico do quantitativo de palavras de NGRAMS = 3 > que 20",
       x = "Quantidade",
       y = "Palavras - Trigram")

4.3 Nuvem de palavras

set.seed(1234)

## Wordcloud | Ngram
wordcloud(words = ngram$palavra_resumo,freq = ngram$n, min.freq = 100, random.order = TRUE, rot.per=0.35, colors = brewer.pal(8, "Dark2"),
          max.words = 200)

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : environmental could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : agriculture could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : countries could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : implementation could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : application could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : practices could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : measures could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : ante could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : using could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : compared could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : limited could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : sustainability could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : impact could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : among could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : quality could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : studies could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : support could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : study could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : economic could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : indicators could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : knowledge could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : efficiency could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : implications could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : emissions could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : growth could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : yield could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : changes could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : assessment could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : expost could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : technology could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : food could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : article could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : productivity could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : management could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : evaluation could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : benefits could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : model could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : significant could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : farms could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : conditions could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : developed could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : large could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : developing could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : income could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : various could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : agricultural could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : increased could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : potential could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : decision could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : levels could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : important could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : within could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : show could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : findings could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : outcomes could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : level could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : regional could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : market could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : years could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : government could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = ngram$palavra_resumo, freq = ngram$n, min.freq =
## 100, : methods could not be fit on page. It will not be plotted.

## Wordcloud | Bigram
wordcloud(words = bigram$palavra_resumo,freq = bigram$n, min.freq = 20, random.order = TRUE, colors = brewer.pal(8, "Dark2"),
          max.words = 100)

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : technology assessment could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : agricultural production could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : public policy could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : impact evaluation could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : environmental impact could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : climate change could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = bigram$palavra_resumo, freq = bigram$n, min.freq =
## 20, : ex post could not be fit on page. It will not be plotted.

## Wordcloud | Trigram
wordcloud(words = trigram$palavra_resumo,
          freq = trigram$n, min.freq = 1, 
          random.order = TRUE, 
          rot.per=0.53,
          colors = brewer.pal(8, "Dark2"), 
          max.words = 150)

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : common agricultural policy could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : genetically modified gm could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : world scientific publishing could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : canadian agricultural economics could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : research development r could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : capital farmland construction could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : propensity score matching could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : fundamental public policy could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : environmental impact assessment could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : sustainability impact assessment could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural land use could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : design methodology approach could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : technology assessment ta could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : rural development measures could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : distribution reproduction medium could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : climate change impacts could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : economic environmental social could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : purpose purpose paper could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : environmental economic social could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : using propensity score could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : european review agricultural could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farmers subsaharan africa could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : standard system agricultural could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ictbased market information could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : attribution license permits could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : use distribution reproduction could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : resource economics society could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : greenhouse gas emissions could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : endogenous switching regression could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : internal rate return could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural applied economics could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : wellfacilitied capital farmland could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : society plant pathology could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : score matching psm could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural sustainable intensification could not be fit on page. It
## will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : smallholder farming systems could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex post impact could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : quality xiaozhan rice could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : modified gm crops could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : positive mathematical programming could not be fit on page. It will not
## be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural policy cap could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : applied economics association could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : computable general equilibrium could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : food nutrition security could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : nutritional quality xiaozhan could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : circular economy china could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural resource economics could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : oil palm plantations could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farmer field school could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : seasonal climate forecasts could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ante impact assessment could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : investment agricultural research could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : technology assessment ota could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : impacts agricultural research could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : impacts climate change could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : economic surplus model could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : conservation agriculture ca could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : life cycle assessment could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : research design used could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : international society plant could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : natural resource management could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : research limitations implications could not be fit on page. It will not
## be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural production systems could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : economics society inc could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : positive significant effect could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : integrated pest management could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : medium provided original could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : environmental social impacts could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : participatory action research could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : reproduction medium provided could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex post facto could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : development r d could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural research development could not be fit on page. It will not
## be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : exante impact assessment could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural research extension could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : illegal farmland conversion could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex ante economic could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : public policy implications could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farmer field schools could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : innovative cropping systems could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : adaptation climate change could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : climate change mitigation could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : originality value paper could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : soil organic matter could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : cycle assessment lca could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : system agricultural circular could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : water use efficiency could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex ante analysis could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : soil water conservation could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : mathematical programming pmp could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : enhance food security could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : ex post analysis could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : review agricultural economics could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : gas ghg emissions could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : western agricultural economics could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : access ictbased market could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : nature switzerland ag could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : nova science publishers could not be fit on page. It will not be plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : office technology assessment could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : permits unrestricted use could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : agricultural research cgiar could not be fit on page. It will not be
## plotted.

## Warning in wordcloud(words = trigram$palavra_resumo, freq = trigram$n, min.freq
## = 1, : farming systems research could not be fit on page. It will not be
## plotted.

5 Metodologias Identificadas

5.1 Metodologia - Abstract

#Gráfico de Metodologias Abstract
metodologia <- read_excel(path = "metodologias_trigram.xlsx")  
metodologia <- as.data.frame(x = metodologia)  

metodologia %>% filter(qtd >= 6) %>% 
  ggplot(mapping = aes(x = qtd, y = reorder(metodologia, qtd), fill=qtd))+
  geom_col(show.legend = FALSE)+
  labs(title = "Metodologias Identificadas - Abstract",
       caption = "Gráfico do quantitativo de metodologias com ocorrência > que 6",
       x = "Quantidade",
       y = "Metodologia")

5.2 Metodologias - Keyword

#Gráfico de Metodologias | Keyword author
mtd_key_au <- read_excel(path = "metodologias_keyword_au.xlsx")
mtd_key_au <- as.data.frame(x = mtd_key_au)  

mtd_key_au %>% filter(qtd >= 2) %>% 
  ggplot(mapping = aes(x = qtd, y = reorder(metodologia, qtd), fill=qtd))+
  geom_col(show.legend = FALSE)+
  labs(title = "Metodologias Identificadas - Keyword Author",
       caption = "Gráfico do quantitativo de metodologias com ocorrência > que 3",
       x = "Quantidade",
       y = "Metodologia")

6 Modelagem de Tópicos

# Criando o Corpus

corpus <- Corpus(VectorSource(base_mba$Abstract))

# Criando a matrix

JSS_dtm <- DocumentTermMatrix(corpus,
                              control = list(stemming = TRUE, stopwords = TRUE, minWordLength = 3,
                                             removeNumbers = TRUE, removePunctuation = TRUE))
dim(JSS_dtm)

## [1]  2040 12841

nrow(JSS_dtm)

## [1] 2040

# Frequência de termos

term_tfidf <-
  tapply(JSS_dtm$v/slam::row_sums(JSS_dtm)[JSS_dtm$i], JSS_dtm$j, mean) *
  log2(nDocs(JSS_dtm)/slam::col_sums(JSS_dtm > 0))

SS_dtm <- JSS_dtm[, term_tfidf >= 0.1]
JSS_dtm <- JSS_dtm[slam::row_sums(JSS_dtm) > 0,]
summary(slam::col_sums(JSS_dtm))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    2.00   18.76    6.00 2689.00

summary(term_tfidf)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01264 0.06074 0.08075 0.10401 0.11822 2.07581

6.1 Agrupamento por similaridade

# Criação de agrupamentos e salvando nova variável "VEM"

k <- 30
SEED <- 2010

jss_TM <- list(
  VEM = LDA(JSS_dtm, k = k, control = list(seed = SEED)))

Topic <- topics(jss_TM[["VEM"]], 1)  #agrupamento dos artigos
table(Topic)

## Topic
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##  54  52  58  34  46  43  82 108 176  65 144  48  71  42  64  71  73  49 101  54 
##  21  22  23  24  25  26  27  28  29  30 
##  46  43  43  48  77 124  53  69  47  55

Terms <- terms(x = jss_TM[["VEM"]], 5) #Os cinco termos mais frequentes para cada tópico são obtidos por
Terms[,1:30]

##      Topic 1    Topic 2     Topic 3 Topic 4    Topic 5      Topic 6   
## [1,] "crop"     "land"      "crop"  "farmer"   "assess"     "law"     
## [2,] "system"   "area"      "yield" "use"      "regul"      "public"  
## [3,] "sustain"  "flood"     "soil"  "prefer"   "research"   "industri"
## [4,] "use"      "use"       "use"   "organ"    "regulatori" "polici"  
## [5,] "pesticid" "territori" "water" "nitrogen" "impact"     "use"     
##      Topic 7      Topic 8  Topic 9      Topic 10     Topic 11     Topic 12    
## [1,] "farm"       "polici" "food"       "farmer"     "impact"     "climat"    
## [2,] "model"      "public" "agricultur" "technolog"  "evalu"      "adapt"     
## [3,] "agricultur" "polit"  "product"    "product"    "research"   "chang"     
## [4,] "use"        "social" "avail"      "agricultur" "develop"    "strategi"  
## [5,] "polici"     "govern" "secur"      "adopt"      "agricultur" "agricultur"
##      Topic 13   Topic 14  Topic 15    Topic 16    Topic 17 Topic 18 Topic 19 
## [1,] "research" "energi"  "forest"    "technolog" "impact" "cost"   "model"  
## [2,] "invest"   "plant"   "conserv"   "evalu"     "polici" "farm"   "use"    
## [3,] "innov"    "biomass" "land"      "assess"    "measur" "use"    "assess" 
## [4,] "develop"  "use"     "ecosystem" "polici"    "effect" "result" "system" 
## [5,] "return"   "product" "servic"    "ethic"     "region" "farmer" "develop"
##      Topic 20  Topic 21      Topic 22 Topic 23 Topic 24  Topic 25  
## [1,] "water"   "product"     "women"  "energi" "countri" "market"  
## [2,] "irrig"   "impact"      "studi"  "emiss"  "develop" "trade"   
## [3,] "use"     "use"         "use"    "pollut" "econom"  "price"   
## [4,] "econom"  "environment" "gender" "polici" "polici"  "contract"
## [5,] "univers" "drought"     "differ" "china"  "use"     "model"   
##      Topic 26     Topic 27   Topic 28 Topic 29      Topic 30    
## [1,] "household"  "farmer"   "risk"   "environment" "program"   
## [2,] "agricultur" "climat"   "insur"  "impact"      "agricultur"
## [3,] "adopt"      "crop"     "loss"   "oil"         "use"       
## [4,] "incom"      "forecast" "manag"  "communiti"   "estim"     
## [5,] "impact"     "season"   "disast" "palm"        "school"

length(Topic)

## [1] 2040

base_mba <- base_mba %>%   
  mutate(VEM = topics(jss_TM[["VEM"]], 1))

base_mba <- data.frame(base_mba)

6.2 Gráfico de calor

base_graf <- readxl::read_xlsx("base_atual.xlsx") #lendo o dataframe

base_df <- base_graf[, c("Source.title", "VEM")] #reduzindo o dataframe para só as colunas de interesse

df <- table(base_df) #transformando em uma tabela de counts

df <- as.data.frame.matrix(df) #transformando a tabela de counts em dataframe

df_sum <- apply(df, 1, sum) #criando um vetor que soma a quantidade de tópicos de cada revista

df_sum <- order(df_sum, decreasing = TRUE) #ordenando o vetor pela quantidade de topicos

df_sorted <- df[df_sum,] #ordenando o dataframe pela soma dos tópicos em cada revista

df_sorted <- data.matrix(df_sorted) #transformando o df em uma matriz numérica

heatmap(df_sorted[1:30, ], cexRow=0.8) #imprimindo o heatmap com os 30 jornais com mais tópicos

#O heatmap tem funções de clustering, portanto ele muda a ordem dos tópicos e revistas
#de acordo com os grupos, e plota o dendrograma para mostrar como dividiu os grupos

6.3 Contagem de revistas

contagem_revistas <- base_graf %>% 
  count(Source.title,
        sort = TRUE) 
View(contagem_revistas)

contagem_revistas %>% filter(n >= 20) %>% 
  ggplot(mapping = aes(x = n, y = reorder(Source.title, n), fill=n))+
  geom_col(show.legend = FALSE)+
  labs(title = "Revistas com maior número de artigos",
       caption = "Revistas com maior número de artigos > que 20 artigos",
       x = "Quantidade",
       y = "Revistas")

6.4 Tipos de Documentos

base_graf %>% 
  ggplot(mapping = aes(x = Year, y = Document.Type))+
  geom_boxplot()+
  labs(title = "Tipologia das publicações",
       caption = "Tipos de publicações no período de 1980 a 2020",
       x = "Ano",
       y = "Tipos de Documentos")

7 Referências

GRUN, B.; HORNIK, K. Topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software, v. 40, n. 13, p. 1–30, 2011. DOIi:10.18637/jss.v040.i13.

SILGE, J.; ROBINSON, D. Text Mining with R: A Tidy Approach. O’Reilly Media, 2017.

8 Material de Apoio

Visualização dos tipos de “highlights”: Por Eran Aviv
Visualização das opções de temas: Por Andrew Zieffler
Dicas para personalização de tabelas pelo pacote kableExtra: Por Hao Zhu
Livro completo sobre RMarkdown: R Markdown Cookbook
Material sobre RMarkdown em português: RLadies BH

9 Créditos

Material elaborado por Daniela Maciel.

Text Mining & Topic Modeling

Daniela Maciel

08 Outubro, 2022