1 Objetivo

Identificar temas na literatura sobre políticas públicas, relacionados ao financiamento da FAPESP, utilizando modelagem de tópicos e tokenização (LDA - topicmodels). * Dados obtidos da [Overton] (“https://www.overton.io/”)

##  [1] "Language"             "cited_topics"         "cited_classification"
##  [4] "Cited_url"            "Title_menc"           "Cited_doi_menc"      
##  [7] "languages_cited_menc" "Source_ID_menc"       "Source_title_menc"   
## [10] "Source_country_menc"  "Source_type_menc"     "Source_region_menc"  
## [13] "Published_on_menc"    "Document_URL_menc"    "Document_type_menc"  
## [16] "snippet_menc"
## 
##   en   pt 
## 3041    1

2 Verificando duplicidades e retirando artigos em PT

Foram identificados: * 1083 títulos duplicados; * 1 título em pt – Arquivo final com: 2060 observações e 16 variáveis

## Rows: 2,060
## Columns: 16
## $ Language             <chr> "en", "en", "en", "en", "en", "en", "en", "en", "…
## $ cited_topics         <chr> "Food | Sugar | Hepatitis E", "Cost of living | T…
## $ cited_classification <chr> "health | health>diseases and conditions | lifest…
## $ Cited_url            <chr> "https://www.overton.io/document.php?policy_docum…
## $ Title_menc           <chr> "seismo info 05 / 2023", "tilapia health: quo vad…
## $ Cited_doi_menc       <chr> "10.1007/s00003-022-01412-x", "10.1111/tbed.14295…
## $ languages_cited_menc <chr> "fre", "eng", "eng", "eng", "por", "eng", "eng", …
## $ Source_ID_menc       <chr> "adminch", "fao", "europa", "efsaeu", "iucn", "go…
## $ Source_title_menc    <chr> "Government of Switzerland", "Food and Agricultur…
## $ Source_country_menc  <chr> "Switzerland", "IGO", "EU", "EU", "France", "Sing…
## $ Source_type_menc     <chr> "government", "igo", "government", "government", …
## $ Source_region_menc   <chr> "Europe", "International Organizations", "Europe"…
## $ Published_on_menc    <chr> "2023-06-30", "2023-02-14", "2022-08-16", "2021-1…
## $ Document_URL_menc    <chr> "https://www.blv.admin.ch/dam/blv/fr/dokumente/le…
## $ Document_type_menc   <chr> "Publication", "Publication", "Publication", "Pub…
## $ snippet_menc         <chr> NA, NA, "The Amazon forest is the largest tropica…

3 Variável de interesse

Definiu-se as seguintes variáveis de interesse:

  1. cited_topics: para modelagem de tópicos
  2. Title_menc e snippet_menc: para verificação de tipologias de publicações

4 Limpando a base para variável “Cited_Topics”

5 Verificando tokens

## 
## 2000 2010 2020 
##   15 1019 1009
## # A tibble: 6 × 2
##   token                   n
##   <chr>               <int>
## 1 medical specialties   138
## 2 clinical medicine     137
## 3 health sciences       112
## 4 <NA>                  100
## 5 health care            83
## 6 natural environment    78
## # A tibble: 20 × 2
##    token           n
##    <chr>       <int>
##  1 health        424
##  2 medicine      295
##  3 sciences      212
##  4 medical       168
##  5 clinical      151
##  6 specialties   138
##  7 natural       101
##  8 na            100
##  9 care           97
## 10 environment    87
## 11 biology        85
## 12 food           85
## 13 science        84
## 14 human          78
## 15 economy        74
## 16 chemistry      71
## 17 climate        68
## 18 physical       67
## 19 nature         65
## 20 de             61
## # A tibble: 20 × 2
##    token                       n
##    <chr>                   <int>
##  1 medical specialties       138
##  2 clinical medicine         137
##  3 health sciences           112
##  4 <NA>                      100
##  5 health care                83
##  6 natural environment        78
##  7 medicine health            57
##  8 human activities           53
##  9 climate change             49
## 10 earth sciences             47
## 11 controlled trial           33
## 12 randomized controlled      33
## 13 systematic review          32
## 14 diseases disorders         30
## 15 physical sciences          28
## 16 branches science           27
## 17 specialties health         27
## 18 medicine medical           26
## 19 sciences health            25
## 20 sustainable development    25
## # A tibble: 20 × 2
##    token                               n
##    <chr>                           <int>
##  1 <NA>                              100
##  2 randomized controlled trial        33
##  3 clinical medicine health           27
##  4 medical specialties health         27
##  5 health sciences health             24
##  6 medicine medical specialties       24
##  7 clinical medicine medicine         23
##  8 medical specialties clinical       22
##  9 specialties clinical medicine      22
## 10 agence nationale de                18
## 11 clinical medicine medical          18
## 12 de lalimentation de                18
## 13 de lenvironnement et               18
## 14 de sécurité sanitaire              18
## 15 et du travail                      18
## 16 health medical specialties         18
## 17 lalimentation de lenvironnement    18
## 18 lenvironnement et du               18
## 19 nationale de sécurité              18
## 20 sanitaire de lalimentation         18

6 Tipologias de publicações

Tipos de publicações definidos com base nos sinônimos descritos em [Thesaurus] (“https://www.thesaurus.com/browse/Proposition”)

termo_preferido variacoes
Report Report, Statement, Description, summary, Record
Plan Plan, arrangement, deal, idea, intention, method, policy, procedure, program, project, proposal, strategy, system
Survey Survey, Poll, Questionnaire, analysis, audit, check, inquiry, inspection, sample
Assessment Assessmet, Appraisal, Evaluation, Estimation, estimate, Analysis, Valuation
Review Review, Examination, Critique, revision
Overview Overview, Summary, Synopsis, Rundown, Outline, Recap, sketch
Study Study, Exploration, Research
Guides GuidesH, andbook, Manual, Directory, Tutorial, Primer
Guidelines Guidelines, Principles, Standards, Protocols, Criteria, Rules, advisement, assignment
Briefing Briefing, Synopsis, Rundown, Summary, Debrief, Recap
Summary Summary, Abstract, Recapitulation, Digest
Policy Policy, Rule, Regulation, Principle, Procedure, policies
Proposition Proposition, Suggestion, Proposal, Submission, Offer, Motion, hypothesis, invitation, motion, premise
## # A tibble: 6 × 21
##   Language cited_topics cited_classification Cited_url Title_menc Cited_doi_menc
##   <chr>    <chr>        <chr>                <chr>     <chr>      <chr>         
## 1 en       Food | Suga… health | health>dis… https://… seismo in… 10.1007/s0000…
## 2 en       Cost of liv… economy, business a… https://… tilapia h… 10.1111/tbed.…
## 3 en       Nature | Ph… environment | envir… https://… deforesta… 10.1016/j.sci…
## 4 en       Botany | Pl… science and technol… https://… plant hea… 10.1007/s1374…
## 5 en       Mariana dam… environment>nature … https://… impactos … 10.1016/j.jha…
## 6 en       CapitaLand … environment>nature … https://… annual re… 10.1007/s4277…
## # ℹ 15 more variables: languages_cited_menc <chr>, Source_ID_menc <chr>,
## #   Source_title_menc <chr>, Source_country_menc <chr>, Source_type_menc <chr>,
## #   Source_region_menc <chr>, Published_on_menc <chr>, Document_URL_menc <chr>,
## #   Document_type_menc <chr>, snippet_menc <chr>, topicos <chr>, ano <dbl>,
## #   decada <dbl>, tit_resumo <chr>, termo_encontrado <chr>

7 Modelagem de tópicos

Neste processo estão sendo definidos 15 tópicos e produzindo uma visualização com os 10 primeiros tokens de maior ocorrência em cada um dos tópicos.

## <<DocumentTermMatrix (documents: 1960, terms: 2668)>>
## Non-/sparse entries: 9778/5219502
## Sparsity           : 100%
## Maximal term length: 30
## Weighting          : term frequency (tf)
## 
##   1   2   3   4   5   6 
## 444 389 346 273 258 250

#Nuvem de palavras por tópicos

8 Verificando bigramas por tópico

Realiza-se uma distribuição dos principais bigramas por cada tópico afim de facilitar a análise da equipe.

## # A tibble: 12 × 22
##    Language cited_topics               cited_classification Cited_url Title_menc
##    <chr>    <chr>                      <chr>                <chr>     <chr>     
##  1 en       Psychological trauma | Po… education | science… https://… cie0057 -…
##  2 en       Biology | Branches of gen… science and technol… https://… adequacy …
##  3 en       Human anatomy | Medical s… health>diseases and… https://… american …
##  4 en       Fibromyalgia | Placebo-co… science and technol… https://… behandlin…
##  5 en       Economy | Human activitie… environment | scien… https://… carbon co…
##  6 en       Nature-based solutions | … environment | envir… https://… addressin…
##  7 en       Medicine | Diseases and d… health>diseases and… https://… diagnosti…
##  8 en       Women's empowerment | Soc… lifestyle and leisu… https://… gender eq…
##  9 en       Digital elevation model |… economy, business a… https://… potential…
## 10 en       Species reintroduction | … science and technol… https://… iucn guid…
## 11 en       Soil | Panicum virgatum |… economy, business a… https://… assessing…
## 12 en       Sodium | Tropical climate… economy, business a… https://… geographi…
## # ℹ 17 more variables: Cited_doi_menc <chr>, languages_cited_menc <chr>,
## #   Source_ID_menc <chr>, Source_title_menc <chr>, Source_country_menc <chr>,
## #   Source_type_menc <chr>, Source_region_menc <chr>, Published_on_menc <chr>,
## #   Document_URL_menc <chr>, Document_type_menc <chr>, snippet_menc <chr>,
## #   topicos <chr>, ano <dbl>, decada <dbl>, tit_resumo <chr>,
## #   termo_encontrado <chr>, topico_modelagem <int>