1 Objetivo

Identificar temas na literatura sobre políticas públicas, relacionados ao financiamento da FAPESP, utilizando modelagem de tópicos e tokenização (LDA - topicmodels). * Dados obtidos da [Overton] (“https://www.overton.io/”)

##  [1] "Language"             "cited_topics"         "cited_classification"
##  [4] "Cited_url"            "Title_menc"           "Cited_doi_menc"      
##  [7] "languages_cited_menc" "Source_ID_menc"       "Source_title_menc"   
## [10] "Source_country_menc"  "Source_type_menc"     "Source_region_menc"  
## [13] "Published_on_menc"    "Document_URL_menc"    "Document_type_menc"  
## [16] "snippet_menc"
## 
##   en   pt 
## 3041    1

2 Verificando duplicidades e retirando artigos em PT

Foram identificados: * 1083 títulos duplicados; * 1 título em pt – Arquivo final com: 2060 observações e 16 variáveis

## Rows: 2,060
## Columns: 16
## $ Language             <chr> "en", "en", "en", "en", "en", "en", "en", "en", "…
## $ cited_topics         <chr> "Food | Sugar | Hepatitis E", "Cost of living | T…
## $ cited_classification <chr> "health | health>diseases and conditions | lifest…
## $ Cited_url            <chr> "https://www.overton.io/document.php?policy_docum…
## $ Title_menc           <chr> "seismo info 05 / 2023", "tilapia health: quo vad…
## $ Cited_doi_menc       <chr> "10.1007/s00003-022-01412-x", "10.1111/tbed.14295…
## $ languages_cited_menc <chr> "fre", "eng", "eng", "eng", "por", "eng", "eng", …
## $ Source_ID_menc       <chr> "adminch", "fao", "europa", "efsaeu", "iucn", "go…
## $ Source_title_menc    <chr> "Government of Switzerland", "Food and Agricultur…
## $ Source_country_menc  <chr> "Switzerland", "IGO", "EU", "EU", "France", "Sing…
## $ Source_type_menc     <chr> "government", "igo", "government", "government", …
## $ Source_region_menc   <chr> "Europe", "International Organizations", "Europe"…
## $ Published_on_menc    <chr> "2023-06-30", "2023-02-14", "2022-08-16", "2021-1…
## $ Document_URL_menc    <chr> "https://www.blv.admin.ch/dam/blv/fr/dokumente/le…
## $ Document_type_menc   <chr> "Publication", "Publication", "Publication", "Pub…
## $ snippet_menc         <chr> NA, NA, "The Amazon forest is the largest tropica…

3 Combinando variáveis de interesse

Definiu-se as seguintes variáveis de interesse: 1. cited_topics 2. Title_menc 3. snippet_menc

4 Limpando a base

5 Verificando tokens

## 
## 2000 2010 2020 
##   15 1019 1009
## # A tibble: 6 × 2
##   token                   n
##   <chr>               <int>
## 1 climate change        151
## 2 medical specialties   138
## 3 clinical medicine     137
## 4 health sciences       113
## 5 health care            99
## 6 natural environment    82
## # A tibble: 20 × 2
##    token           n
##    <chr>       <int>
##  1 na           1291
##  2 health        883
##  3 de            861
##  4 report        393
##  5 food          377
##  6 la            330
##  7 review        323
##  8 medicine      315
##  9 clinical      305
## 10 risk          286
## 11 assessment    265
## 12 climate       249
## 13 et            245
## 14 research      240
## 15 management    226
## 16 disease       224
## 17 sciences      222
## 18 use           221
## 19 treatment     219
## 20 development   213
## # A tibble: 20 × 2
##    token                       n
##    <chr>                   <int>
##  1 climate change            151
##  2 medical specialties       138
##  3 clinical medicine         137
##  4 health sciences           113
##  5 health care                99
##  6 natural environment        82
##  7 search methods             82
##  8 de la                      79
##  9 systematic review          73
## 10 public health              62
## 11 mental health              58
## 12 medicine health            57
## 13 human activities           56
## 14 methods searched           54
## 15 sustainable development    54
## 16 risk assessment            53
## 17 european commission        52
## 18 earth sciences             47
## 19 controlled trials          45
## 20 latin america              44
## # A tibble: 20 × 2
##    token                                     n
##    <chr>                                 <int>
##  1 search methods searched                  54
##  2 de lanses relatif                        34
##  3 randomized controlled trial              34
##  4 canada.ca publication information        29
##  5 information bibliographic record         29
##  6 publication information bibliographic    29
##  7 publications canada.ca publication       29
##  8 clinical medicine health                 27
##  9 medical specialties health               27
## 10 rapport de lanses                        26
## 11 lanses relatif à                         25
## 12 et rapport de                            24
## 13 health sciences health                   24
## 14 medicine medical specialties             24
## 15 prise en charge                          24
## 16 climate change mitigation                23
## 17 clinical medicine medicine               23
## 18 methods searched cochrane                23
## 19 medical specialties clinical             22
## 20 specialties clinical medicine            22

6 Modelagem de tópicos

Neste processo estão sendo definidos 15 tópicos e produzindo uma visualização com os 10 primeiros tokens de maior ocorrência em cada um dos tópicos.

## <<DocumentTermMatrix (documents: 2059, terms: 16041)>>
## Non-/sparse entries: 63745/32964674
## Sparsity           : 100%
## Maximal term length: 93
## Weighting          : term frequency (tf)
## 
##   1   2   3   4   5   6   7   8   9  10 
## 240 208 212 261 168 222 187 213 181 167

#Nuvem de palavras por tópicos

7 Verificando bigramas por tópico

Realiza-se uma distribuição dos principais bigramas por cada tópico afim de facilitar a análise da equipe.

## 
##   1   2   3   4   5   6   7   8   9  10 
## 240 208 212 261 168 222 187 213 181 167
## # A tibble: 20 × 20
##    Language cited_topics               cited_classification Cited_url Title_menc
##    <chr>    <chr>                      <chr>                <chr>     <chr>     
##  1 en       <NA>                       <NA>                 https://… univerzit…
##  2 en       <NA>                       <NA>                 https://… marija gl…
##  3 en       Yttrium aluminium garnet … science and technol… https://… laser the…
##  4 en       Nutrition | Essential nut… health>health treat… https://… comprehen…
##  5 en       Trace (linear algebra) | … science and technol… https://… specializ…
##  6 en       Leishmaniasis | Visceral … health>diseases and… https://… eurosurve…
##  7 en       Urban heat island | Atmos… weather | science a… https://… guidance …
##  8 en       Nattō | Nutrition | Prote… lifestyle and leisu… https://… fag barbe…
##  9 en       Clinical medicine | Behav… science and technol… https://… cultures …
## 10 en       Medical specialties | Bio… science and technol… https://… the human…
## 11 en       National Sanitary Surveil… health | health>dis… https://… comunicad…
## 12 en       Non-communicable disease … health | health>dis… https://… front-of-…
## 13 en       Hypertension | Hyperchole… health | health>hea… https://… sexual id…
## 14 en       Capitation (healthcare) |… science and technol… https://… supportin…
## 15 en       Chemistry | Lincosamides … economy, business a… https://… selection…
## 16 en       Intensive pig farming | H… health | health>dis… https://… welfare o…
## 17 en       Chronic obstructive pulmo… health>diseases and… https://… managemen…
## 18 en       Bone density | Osteoporos… health | health>dis… https://… consensus…
## 19 en       Tsetse fly | Medical spec… science and technol… https://… bulletin …
## 20 en       Health care | Epidemiolog… health>diseases and… https://… recommand…
## # ℹ 15 more variables: Cited_doi_menc <chr>, languages_cited_menc <chr>,
## #   Source_ID_menc <chr>, Source_title_menc <chr>, Source_country_menc <chr>,
## #   Source_type_menc <chr>, Source_region_menc <chr>, Published_on_menc <chr>,
## #   Document_URL_menc <chr>, Document_type_menc <chr>, snippet_menc <chr>,
## #   combinada <chr>, ano <dbl>, decada <dbl>, topico <int>