Introduction

  • Literature and evidence review essential in public health practice
  • Exponential growth in volume of literature
  • Initial first steps usually:
    • Developing search strategy
    • Reviwing and filtering abstracts
    • Obtaining full text (if possible)
    • Data extraction

This can be a manual and protracted interative process which may involve using specialised searching services, downloading abstracts, reading and filtering, secondary searching and so on, and may involve sifting many thousands of abstracts.

Often we may just want a rapid overview of the literature to help focus further reviewing.

In this vignette we demonstrate the use of R packages for large scale extraction of abstracts, and analytical techniques for identifying topics or themes in the abstracts.

The vignette is based on a number of R packages:

  1. europepmc - this is a sophisticated tool which interacts with the PubMedCentral API and provdes access to additional fields.
  2. adjutant - this is a fully fledged package with retrieval and clustering functions. 3.tidytext - a package for text mining using tidy data principles.
  3. Rtsne - this uses the tSNE algorithm for data reduction and cluster visualisation
  4. dbscan - applies the HDBSCAN algorithm for data clustering
  5. myScrapers - wraps some functions built on other packages to automate the search, extraction, and filtering process.

We have “hacked” some of the functions in these packages and written additional functions to develop a work flow from searching and retrieval to analysis

A simple example using europepmc

Searching Europe PubMed Central (epmc)

This is a package which allows searching of EuropePMC via the API.

It can be downloaded from CRAN.


if(!require("europepmc")) install.packages("europepmc")
library(europepmc)

The main function is epmc_search which allows us to search the site and retrieve abstracts, metadata and citation counts.

We’ll use it with the search term “deep learning” AND “public health”.


head(epmc_search(params$search, limit = 10))
#> # A tibble: 6 x 28
#>   id    source pmid  doi   title authorString journalTitle journalVolume
#>   <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>        
#> 1 3143~ MED    3143~ 10.3~ A De~ Zhang S, Po~ Stud Health~ 264          
#> 2 3145~ MED    3145~ 10.1~ Arti~ Patel UK, A~ J Neurol     <NA>         
#> 3 3092~ MED    3092~ 10.6~ [Art~ Lin SH, Che~ Hu Li Za Zhi 66           
#> 4 3116~ MED    3116~ 10.1~ "[Ap~ Uchida M, N~ Sangyo Eise~ <NA>         
#> 5 3118~ MED    3118~ 10.1~ Comp~ Soliman M, ~ Epidemics    28           
#> 6 PPR9~ PPR    <NA>  10.1~ Atro~ Ratul MAR, ~ <NA>         <NA>         
#> # ... with 20 more variables: pubYear <chr>, journalIssn <chr>,
#> #   pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, issue <chr>, pmcid <chr>, hasSuppl <chr>

This doesn’t extract the abstract text or Mesh headings (keywords) - to facilitate this we have wrapped the search function, into get_full_search in myScrapers.

library(tictoc)

tic()
search1 <- get_full_search(search = params$search, limit = params$limit)
toc()
#> 254.51 sec elapsed

head(search1, 20)
#> # A tibble: 20 x 32
#>    id    source pmid  doi   title authorString journalTitle journalVolume
#>    <chr> <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr>        
#>  1 3143~ MED    3143~ 10.3~ A De~ Zhang S, Po~ Stud Health~ 264          
#>  2 3145~ MED    3145~ 10.1~ Arti~ Patel UK, A~ J Neurol     <NA>         
#>  3 3092~ MED    3092~ 10.6~ [Art~ Lin SH, Che~ Hu Li Za Zhi 66           
#>  4 3116~ MED    3116~ 10.1~ "[Ap~ Uchida M, N~ Sangyo Eise~ <NA>         
#>  5 3118~ MED    3118~ 10.1~ Comp~ Soliman M, ~ Epidemics    28           
#>  6 PPR9~ PPR    <NA>  10.1~ Atro~ Ratul MAR, ~ <NA>         <NA>         
#>  7 3114~ MED    3114~ 10.3~ The ~ Cheon S, Ki~ Int J Envir~ 16           
#>  8 3141~ MED    3141~ 10.1~ Sate~ Bruzelius E~ J Am Med In~ 26           
#>  9 3112~ MED    3112~ 10.1~ Deep~ Khalighifar~ J Med Entom~ <NA>         
#> 10 3114~ MED    3114~ 10.2~ Prom~ Balyen L, P~ Asia Pac J ~ 8            
#> 11 3142~ MED    3142~ 10.1~ Auto~ Obeid JS, W~ BMC Med Inf~ 19           
#> 12 3127~ MED    3127~ 10.1~ Mach~ Doupe P, Fa~ Value Health 22           
#> 13 3119~ MED    3119~ 10.1~ Deep~ Graffy PM, ~ Br J Radiol  92           
#> 14 3121~ MED    3121~ 10.3~ Dire~ Qian F, Che~ Int J Envir~ 16           
#> 15 3097~ MED    3097~ 10.1~ Auto~ Graffy PM, ~ Abdom Radio~ <NA>         
#> 16 3097~ MED    3097~ 10.3~ A De~ Lim J, Kim ~ Int J Envir~ 16           
#> 17 3134~ MED    3134~ 10.1~ Erra~ Ruamviboons~ NPJ Digit M~ 2            
#> 18 PPR9~ PPR    <NA>  10.1~ Deve~ Xu J, Xu K,~ <NA>         <NA>         
#> 19 3140~ MED    3140~ 10.1~ Stra~ Wong TY, Sa~ Ophthalmolo~ <NA>         
#> 20 3080~ MED    3080~ 10.1~ Deep~ Lee SM, Seo~ J Thorac Im~ 34           
#> # ... with 24 more variables: pubYear <chr>, journalIssn <chr>,
#> #   pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, issue <chr>, pmcid <chr>, hasSuppl <chr>,
#> #   name <int>, absText <list>, mesh <list>, keywords <chr>

We can see that the get_full_search function returns addition metadata such as citation counts, whether the journal is open access and whether there is PDF available. By default, 1000 article descriptions are downloaded. It also includes mesh headings and abstract text.

we can see how many articles are available altogether by running epmc_profile.


profile <- epmc_profile(query = params$search)

Running epmc_profile allows us to see that there are 704 articles of which 638 are full text articles, and 489 are open access.

Analysing abstracts

Abstracts per year

We can easily look at annual abstract frequency - we can readily see the growth in publication frequency in the last 3 years.


search1 %>%
  count(pubYear) %>%
  ggplot(aes(pubYear, n)) +
  geom_col(fill = "blue") +
  labs(title = "Abstracts per year", 
       subtitle = paste("Search: ", params$search)) +
  phecharts::theme_phe() +
  theme(axis.text.x = element_text(angle = 45 ,hjust = 1))

Journal frequency

Similarly we can identify the most frequent journals


journal_count <- search1 %>%
  count(journalTitle) %>%
  top_n(20) %>%
  arrange(-n)

 journal_count %>%
  ggplot(aes(reorder(journalTitle, n), n)) +
  geom_col(fill = "blue") +
  coord_flip() +
  labs(title = "Journal frequency") +
  phecharts::theme_phe()

Int J Environ Res Public Health and PLoS One are the most frequent journals publishing articles on “deep learning” AND “public health”.

Topic identification

Once we have a data frame of 704 records with abstract text, we can prepare the data for analysis. THe create_corpus function is designed for this.


out1 <- search1 %>%
  select(pmid, pmcid ,doi, title, pubYear, citedByCount, absText, journalTitle) %>%
  filter(absText != "NULL") %>%
  mutate(text = paste(title, absText))

Text mining

We will use a method exemplified in the adjutant package which uses unsupervised machine learning to try and cluster similar articles and attach themes.

In this approach undertake some natural language processing. We will

  • Split each abstract into groups is single words
  • Remove numbers and common (stop) words
  • Stem each word (definition:)
  • Calculate the tf-idf score for each word in each abstract - this gives more weight to words which are more “typical” of the abstracts
  • Create a document feature matrix
  • Undertake dimensionality reduction using tSNE to simplify
  • Run HDBSCAN to identify clusters
  • Name the clusters
  • QA the result

The ultimate output of this analysis is a visualisation of clustered and labelled abstracts and a interactive table.


library(tidytext)

corp <- create_corpus(df = search1)

head(corp$corpus)
#> # A tibble: 6 x 6
#>   pmid     word       n      tf   idf tf_idf
#>   <chr>    <chr>  <int>   <dbl> <dbl>  <dbl>
#> 1 10463892 achiev     1 0.00671  1.72 0.0116
#> 2 10463892 admiss     1 0.00671  4.41 0.0296
#> 3 10463892 applic     5 0.0336   1.44 0.0482
#> 4 10463892 assess     1 0.00671  1.52 0.0102
#> 5 10463892 autumn     1 0.00671  6.49 0.0436
#> 6 10463892 bsc        1 0.00671  6.49 0.0436

clust <- create_cluster(corpus = corp$corpus, minPts = 10)
#> 19.33 sec elapsed


clust$cluster_size
#> # A tibble: 14 x 2
#>    cluster     n
#>      <dbl> <int>
#>  1       0   212
#>  2       1    10
#>  3       2    15
#>  4       3    21
#>  5       4    39
#>  6       5    26
#>  7       6    16
#>  8       7    26
#>  9       8    19
#> 10       9    19
#> 11      10   105
#> 12      11    19
#> 13      12    65
#> 14      13    69

Labelling clusters


labels <- label_clusters(corp$corpus, clustering = clust$clustering, top_n = 4)
#> 0.63 sec elapsed

labels$labels
#> # A tibble: 14 x 2
#> # Groups:   cluster [14]
#>    cluster clus_names                                                     
#>      <dbl> <chr>                                                          
#>  1       0 data-learn-base-studi                                          
#>  2       1 pollut-qualiti-network-data                                    
#>  3       2 resist-antibiot-antimicrobi-health                             
#>  4       3 segment-imag-convolut-neural-deep-network-method-perform-result
#>  5       4 genom-identifi-data-studi                                      
#>  6       5 social-health-data-base                                        
#>  7       6 drug-advers-safeti-model-base-studi                            
#>  8       7 clinic-model-learn-method                                      
#>  9       8 breast-cancer-imag-base-studi                                  
#> 10       9 diabet-retinopathi-screen-imag-patient-learn-base              
#> 11      10 model-learn-data-studi                                         
#> 12      11 ai-intellig-artifici-health-data                               
#> 13      12 data-health-research-develop                                   
#> 14      13 student-educ-learn-studi

Visualise


p <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  ggplot(aes(X1, X2)) +
  geom_point(aes(colour = clustered, size = citedByCount) ) +
  ggrepel::geom_text_repel(data = labels$plot, aes(medX, medY, label = clus_names), size = 3, colour = "#006d2c", alpha = 0.9)

p + scale_alpha_manual(values=c(1,0)) +
  viridis::scale_color_viridis(discrete = TRUE, option = "cividis", alpha = .6) +
  phecharts::theme_phe() +
  theme(panel.background = element_rect(fill = "#f0f0f0")) +
  labs(subtitle = paste("Clustering: ", nrow(labels$plot), " topics" ), 
       title = paste("Search ", "= ", params$search ))

Understanding the labels

Most cited articles


most_cited <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(cluster !=0) %>%
  group_by(clus_names) %>%
  top_n(n = 3, citedByCount) %>%
  select(clus_names, title, pubYear, citedByCount) %>%
  ungroup() %>%
  arrange(clus_names, -citedByCount)

most_cited %>%
  formattable::formattable()
clus_names title pubYear citedByCount
ai-intellig-artifici-health-data Artificial intelligence in cancer imaging: Clinical challenges and applications. 2019 4
ai-intellig-artifici-health-data Global Evolution of Research in Artificial Intelligence in Health and Medicine: A Bibliometric Study. 2019 3
ai-intellig-artifici-health-data Cognitive computing and eScience in health and life science research: artificial intelligence and obesity intervention programs. 2017 2
breast-cancer-imag-base-studi Deep learning based tissue analysis predicts outcome in colorectal cancer. 2018 21
breast-cancer-imag-base-studi Antibody-supervised deep learning for quantification of tumor-infiltrating immune cells in hematoxylin and eosin stained breast cancer samples. 2016 12
breast-cancer-imag-base-studi Mammographic density and structural features can individually and jointly contribute to breast cancer risk assessment in mammography screening: a case-control study. 2016 7
clinic-model-learn-method Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications. 2016 11
clinic-model-learn-method Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. 2018 8
clinic-model-learn-method EliIE: An open-source information extraction system for clinical trial eligibility criteria. 2017 7
data-health-research-develop Quality collaboratives: lessons from research. 2002 231
data-health-research-develop Building better biomarkers: brain models in translational neuroimaging. 2017 72
data-health-research-develop Making sense of big data in health research: Towards an EU action plan. 2016 44
diabet-retinopathi-screen-imag-patient-learn-base Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning. 2016 48
diabet-retinopathi-screen-imag-patient-learn-base Retinal Imaging Techniques for Diabetic Retinopathy Screening. 2016 9
diabet-retinopathi-screen-imag-patient-learn-base Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database. 2017 7
drug-advers-safeti-model-base-studi Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. 2015 63
drug-advers-safeti-model-base-studi Drug drug interaction extraction from biomedical literature using syntax convolutional neural network. 2016 17
drug-advers-safeti-model-base-studi Natural Products for Drug Discovery in the 21st Century: Innovations for Novel Drug Discovery. 2018 13
genom-identifi-data-studi Comprehensive functional genomic resource and integrative model for the human brain. 2018 12
genom-identifi-data-studi Pleiotropic Mechanisms Indicated for Sex Differences in Autism. 2016 9
genom-identifi-data-studi Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. 2018 9
model-learn-data-studi Deep learning for neuroimaging: a validation study. 2014 53
model-learn-data-studi Forecasting influenza in Hong Kong with Google search queries and statistical model fusion. 2017 11
model-learn-data-studi Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. 2017 10
pollut-qualiti-network-data Design of a Mobile Low-Cost Sensor Network Using Urban Buses for Real-Time Ubiquitous Noise Monitoring. 2016 6
pollut-qualiti-network-data A systematic review of data mining and machine learning for air pollution epidemiology. 2017 6
pollut-qualiti-network-data Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. 2017 5
pollut-qualiti-network-data Towards Personal Exposures: How Technology Is Changing Air Pollution and Health Research. 2017 5
resist-antibiot-antimicrobi-health DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. 2018 14
resist-antibiot-antimicrobi-health Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae. 2018 6
resist-antibiot-antimicrobi-health Myxinidin2 and myxinidin3 suppress inflammatory responses through STAT3 and MAPKs to promote wound healing. 2017 4
resist-antibiot-antimicrobi-health Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal Salmonella. 2019 4
segment-imag-convolut-neural-deep-network-method-perform-result Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets. 2016 29
segment-imag-convolut-neural-deep-network-method-perform-result ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. 2017 24
segment-imag-convolut-neural-deep-network-method-perform-result Deep convolutional neural network and 3D deformable approach for tissue segmentation in musculoskeletal magnetic resonance imaging. 2018 12
social-health-data-base The use of social networking platforms for sexual health promotion: identifying key strategies for successful user engagement. 2015 16
social-health-data-base Researching Mental Health Disorders in the Era of Social Media: Systematic Review. 2017 14
social-health-data-base Characterizing the Discussion of Antibiotics in the Twittersphere: What is the Bigger Picture? 2015 13
student-educ-learn-studi Clinical experience, performance in final examinations, and learning style in medical students: prospective study. 1998 93
student-educ-learn-studi Intercalated degrees, learning styles, and career preferences: prospective longitudinal study of UK medical students. 1999 66
student-educ-learn-studi Randomised controlled trial of clinical decision support tools to improve learning of evidence based medicine in medical students. 2003 62

Use of keywords

We can review the commonest Mesh headings associated with each cluster tag.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  select(clus_names, mesh) %>%
  filter(mesh != "NULL") %>%
  unnest(mesh) %>%
  count(clus_names, mesh,sort = TRUE) %>%
  filter(n < 30) %>%
  ungroup() %>%
  group_by(clus_names) %>%
  top_n(10)  %>%
  mutate(summary = paste(mesh, collapse = "; " )) %>%
  select(-c(mesh, n)) %>%
  distinct() %>%
  arrange(clus_names) %>%
  knitr::kable()
clus_names summary
ai-intellig-artifici-health-data Artificial Intelligence; Big Data; Public Health
breast-cancer-imag-base-studi Humans; Breast Neoplasms; Female; Middle Aged; Aged; Breast; Machine Learning; Mammography; Retrospective Studies; Adult; Aged, 80 and over; Algorithms; Breast Density; Deep Learning; Early Detection of Cancer; Image Interpretation, Computer-Assisted; Image Processing, Computer-Assisted; Magnetic Resonance Imaging; Male; Neoplasms; Risk Assessment; ROC Curve; Sensitivity and Specificity; Ultrasonography, Mammary
clinic-model-learn-method Humans; Algorithms; Electronic Health Records; Natural Language Processing; Machine Learning; Neural Networks (Computer); Datasets as Topic; International Classification of Diseases; Artificial Intelligence; Bayes Theorem; Phenotype
data-health-research-develop Public Health; Data Mining; Databases, Factual; Delivery of Health Care; Medical Informatics; Artificial Intelligence; Biomedical Research; Electronic Health Records; Machine Learning; Translational Medical Research
data-learn-base-studi Female; Machine Learning; Male; Algorithms; Deep Learning; Middle Aged; Neural Networks (Computer); Aged; Image Processing, Computer-Assisted; Adult; Tomography, X-Ray Computed
diabet-retinopathi-screen-imag-patient-learn-base Humans; Diabetic Retinopathy; Female; Male; Aged; Aged, 80 and over; Middle Aged; Retina; Adult; Cross-Sectional Studies; Diagnosis, Computer-Assisted; Diagnostic Techniques, Ophthalmological; Image Processing, Computer-Assisted; Neural Networks (Computer); Reproducibility of Results; ROC Curve; Young Adult
drug-advers-safeti-model-base-studi Humans; Artificial Intelligence; Data Mining; Drug-Related Side Effects and Adverse Reactions; Neural Networks (Computer); Social Media; Area Under Curve; Automation, Laboratory; Back Pain; Biological Products; Computational Biology; Computer Simulation; Databases as Topic; Deep Learning; Drug Design; Drug Discovery; Drug Industry; Drug Interactions; Information Storage and Retrieval; Models, Chemical; Models, Theoretical; Natural Language Processing; Necrosis; Pharmacovigilance; Phytotherapy; Plants, Medicinal; Programming Languages; Publications; Robotics; Semantics; Software; Supervised Machine Learning
genom-identifi-data-studi Humans; Genome-Wide Association Study; Computational Biology; Genetic Predisposition to Disease; Algorithms; Databases, Genetic; Deep Learning; Female; Genome, Human; Genomics; Polymorphism, Single Nucleotide
model-learn-data-studi Machine Learning; Female; Neural Networks (Computer); Male; Algorithms; Deep Learning; Adult; Middle Aged; Aged; China; Prognosis
pollut-qualiti-network-data Air Pollution; Air Pollutants; Environmental Monitoring; Humans; Neural Networks (Computer); Cities; Forecasting; Algorithms; Automation; Beijing; Data Mining; Deep Learning; Electroencephalography; Electrooculography; Environmental Exposure; Epidemiologic Studies; Hong Kong; Inventions; Machine Learning; Models, Statistical; Models, Theoretical; Particulate Matter; Polysomnography; Sleep Stages; Sleep Wake Disorders; Smartphone
resist-antibiot-antimicrobi-health Humans; Anti-Bacterial Agents; Drug Resistance, Multiple, Bacterial; Machine Learning; Microbial Sensitivity Tests; Antimicrobial Cationic Peptides; Biofilms; Cell Membrane; DNA, Bacterial; Genome, Bacterial; High-Throughput Nucleotide Sequencing; Lipopolysaccharides; Sequence Analysis, DNA; Whole Genome Sequencing
segment-imag-convolut-neural-deep-network-method-perform-result Humans; Female; Male; Algorithms; Image Processing, Computer-Assisted; Middle Aged; Neural Networks (Computer); Adult; Magnetic Resonance Imaging; Aged; Deep Learning; Image Interpretation, Computer-Assisted; Young Adult
social-health-data-base Humans; Social Media; Neural Networks (Computer); Machine Learning; Adolescent; Adult; Algorithms; Analgesics, Opioid; Deep Learning; Female; Internet; Male; Middle Aged; Public Opinion; Young Adult
student-educ-learn-studi Female; Male; Curriculum; Students, Medical; Educational Measurement; Learning; Adult; Education, Medical, Undergraduate; Problem-Based Learning; Young Adult

Investigating individial themes to identify full-text articles

Lets explore articles for which public health is a Mesh heading.


ph <- labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid")) %>%
  filter(str_detect(keywords, "Public Health"))

ph %>%
  count(clus_names, sort = TRUE)
#> # A tibble: 4 x 2
#>   clus_names                           n
#>   <chr>                            <int>
#> 1 data-health-research-develop         7
#> 2 model-learn-data-studi               2
#> 3 student-educ-learn-studi             2
#> 4 ai-intellig-artifici-health-data     1

There is one article tagged with ai-intellig-artifici-health-data which has Public Health as a mesh heading. We can use epmc_ftxt to extract the full text article.

library(rvest)



get_pmcids <- ph %>%
  filter(clus_names == "data-research-health-develop") %>%
  select(id, pmcid) %>%
  filter(!is.na(pmcid))


details <- mutate(ids, details = map(get_ids, epmc_details))

full_text <- details %>%
    mutate(full_text = map(details, "ftx")) %>%
    unnest(full_text) %>%
  filter(availability == "Free") %>%
  left_join(get_pmcids, by = c("value" = "id")) %>%
  distinct()


full_text <- europepmc::epmc_ftxt("PMC5171550")

ft <- full_text %>%
  html_text()

ft %>%
  str_split(., "\\. ") %>%
  enframe() %>%
  formattable::formattable()

Finally we can gather all the abstracts into a single interactive table which can be searched, filtered and shared.


labels$results %>%
  left_join(search1, by = c("pmid.value" = "pmid"))  %>%
  select(cluster, clus_names, doi, title, journalTitle, pubYear, citedByCount, absText) %>%
  mutate(doi = paste0("<a href = https://", doi, ">doi</a>")) %>%
  DT::datatable(escape = FALSE, extensions = c('Responsive','Buttons', 'FixedHeader'), 
                filter = "top", 
  options = list(
    autoWidth = TRUE,
    columnDefs = list(list(width = '450px')),
    dom = 'Bfrtip',
    buttons = c('csv', 'excel'),
    fixedHeader=TRUE) 
  )