Julian Flowers 2020-02-06
This vignette shows how to extract abstracts from Pubmed and perform simple topic modelling on them. It uses functions in the myScrapers
package which can be downloaded as below.
The first step is to search Pubmed. We use the pubmedAbstractR
function. This is a wrapper for RISmed
and interacts with the NCBI E-utilities API. It takes 5 arguments:
In addition it is recommended to obtain an API key for NCBI. Instructions on how to obtain a key is available from here. Once you have a key you should store it as an environment variable.
There are two other arguments to extract authors and mesh headings (keywords). These are set to FALSE by default.
In this example we show how to search for articles on population health management.
## load key
key <- Sys.getenv("ncbi_key")
## initialise - initially with n = 1 - this will tell us how many abstracts out query returns
query <- "(data science[mh] OR data science[tw] OR big data[tw]) (public health[mh] OR population health[mh] OR public health surveillance[mh] OR public health practice[mh] OR public health informatics[mh])"
n <- 1
end <- 2020
## search
out <- pubmedAbstractR(search = query, n = n, end = end, ncbi_key = key, keyword = TRUE)
#> Loading required package: RISmed
#> Loading required package: glue
#>
#> Attaching package: 'glue'
#> The following object is masked from 'package:dplyr':
#>
#> collapse
#> Please wait...Your query is ("data science"[MeSH Terms] OR data science[tw] OR big data[tw]) AND ("public health"[MeSH Terms] OR "population health"[MeSH Terms] OR "public health surveillance"[MeSH Terms] OR "public health practice"[MeSH Terms] OR "public health informatics"[MeSH Terms]) AND 2000[PDAT] : 2020[PDAT]. This returns 2210 abstracts. By default 1000 abstracts are downloaded. You downloaded 1 abstracts. To retrieve more set 'n =' argument to the desired value
#> Joining, by = "DOI"
The query returns 2210 abstracts. The search term is translated by the API into (“data science”[MeSH Terms] OR data science[tw] OR big data[tw]) AND (“public health”[MeSH Terms] OR “population health”[MeSH Terms] OR “public health surveillance”[MeSH Terms] OR “public health practice”[MeSH Terms] OR “public health informatics”[MeSH Terms]) AND 2000[PDAT] : 2020[PDAT]
Let us download them.
n <- out$n_articles
results <- pubmedAbstractR(search = query, n = n, end = end, ncbi_key = key, keyword = TRUE)
#> Please wait...Your query is ("data science"[MeSH Terms] OR data science[tw] OR big data[tw]) AND ("public health"[MeSH Terms] OR "population health"[MeSH Terms] OR "public health surveillance"[MeSH Terms] OR "public health practice"[MeSH Terms] OR "public health informatics"[MeSH Terms]) AND 2000[PDAT] : 2020[PDAT]. This returns 2210 abstracts. By default 1000 abstracts are downloaded. You downloaded 2210 abstracts. To retrieve more set 'n =' argument to the desired value
#> Joining, by = "DOI"
head(results$abstracts)
#> # A tibble: 6 x 6
#> title abstract journal DOI year keyword
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Pharmacoepidemiology … Pharmacoepidemiology is… Chimia 31883… 2019 Data Sci…
#> 2 Pharmacoepidemiology … Pharmacoepidemiology is… Chimia 31883… 2019 Database…
#> 3 Pharmacoepidemiology … Pharmacoepidemiology is… Chimia 31883… 2019 Humans
#> 4 Pharmacoepidemiology … Pharmacoepidemiology is… Chimia 31883… 2019 Machine …
#> 5 Pharmacoepidemiology … Pharmacoepidemiology is… Chimia 31883… 2019 Pharmaco…
#> 6 Pharmacoepidemiology … Pharmacoepidemiology is… Chimia 31883… 2019 Precisio…
We will use a method based on the adjutant
package. This proceeds in the 3 steps:
dbscan
or mclust
In myScrapers there are 3 functions to achieve this flow:
create_abstract_corpus
. This takes the abstracts as input and creates the tf_idf matrix. To make this work we need to rename some of the fields (see below) and clean up the abstractscreate_abstract_cluster
. This takes the corpus and allocates a cluster for each abstract. It has a minPts
parameter which can be altered by the user - this relfects the number of hearest neighbours to use for clustering. default is 20.create_cluster_labels
. This assigns labels to each cluster based on term frequeny and tf_idf values within the features for each cluster.library(tm)
#> Loading required package: NLP
#>
#> Attaching package: 'NLP'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
rename_results <- results$abstracts %>%
rename(pmid = DOI, absText = abstract, pubYear = year) %>%
select(-keyword) %>%
distinct()
## remove stopwords and numbers
rename_results <- rename_results %>%
mutate(absText = tm::removeNumbers(absText),
absText = tm::removeWords(absText, c(stopwords("en"), "abstracttext")))
## tokenise
rename_corp <- create_abstract_corpus(rename_results)
#> Loading required package: tidytext
#> Joining, by = "word"
head(rename_corp$corpus)
#> # A tibble: 6 x 6
#> pmid word n tf idf tf_idf
#> <chr> <chr> <int> <dbl> <dbl> <dbl>
#> 1 15043761 analysi 6 0.0451 1.15 0.0518
#> 2 15043761 articl 1 0.00752 2.23 0.0168
#> 3 15043761 author 1 0.00752 3.48 0.0262
#> 4 15043761 background 1 0.00752 2.51 0.0189
#> 5 15043761 categor 1 0.00752 4.40 0.0331
#> 6 15043761 class 3 0.0226 3.83 0.0864
We’ll pass the corpus to the clustering algorithm and set a value for the minPts
parameter. This trades-off the number of clusters and the proportion of abstracts which can’t be properly clustered by the dbscan algorithm.
set.seed(42)
rename_clust <- create_abstract_cluster(rename_corp$corpus, minPts = 10)
#> If there are a small number of abstracts, set perplexity value
#> to less than 30% of abstract count
#> Loading required package: Rtsne
#> Loading required package: dbscan
#> Loading required package: tictoc
#> 172.338 sec elapsed
rename_clust$cluster_count
#> [1] 32
This generates 32 clusters.
We can now label then and plot.
labels<- create_cluster_labels(rename_corp$corpus, rename_clust$clustering, top_n = 6)
#> Joining, by = "cluster"
labels$labels
#> # A tibble: 32 x 2
#> # Groups: cluster [32]
#> cluster clus_names
#> <dbl> <chr>
#> 1 0 health-research-analysi-studi-base-data
#> 2 1 na
#> 3 2 nurs-health-patient-care-research-data
#> 4 3 cancer-patient-clinic-base-studi-data
#> 5 4 metric-method-provid-studi-base-data
#> 6 5 de-identif-inform-develop-patient-data
#> 7 6 ethic-opportun-challeng-clinic-research-data
#> 8 7 registri-multipl-patient-care-studi-clinic-research-includ-data
#> 9 8 epidemiologi-diseas-research-health-studi-data
#> 10 9 pitfal-research-potenti-provid-studi-data
#> # … with 22 more rows
labels$results %>%
count(clus_names) %>%
ggplot(aes(reorder(clus_names, n), n)) +
geom_col() +
coord_flip()
This has generated 32 groups of abstracts:
For the purposes of our study clusters 0, 1, 4, 5, 8 and 21, 30 are of most interest.
We can filter abstracts by cluster, but also by keywords.
filtered_data <- labels$results %>%
left_join(results$abstracts, by = c("pmid.value" = "DOI")) %>%
filter(cluster %in% c(0, 1, 4, 5, 8 , 21, 31))
This reduces the number of abstracts to 15072.
We can undertake further filtering, For example, which articles mention public health and/or data science. There are no abstracts which use the phrase public health data science.
labels$results %>%
left_join(results$abstracts, by = c("pmid.value" = "DOI")) %>%
filter(str_detect(abstract, "public health data science")|str_detect(keyword, "Public Health|Data Science"))
#> pmid.name pmid.value X1 X2 cluster V2
#> 1 2 15503880 -0.6513155 -19.7271766 12 -0.6513155
#> 2 2 15503880 -0.6513155 -19.7271766 12 -0.6513155
#> 3 27 23462917 17.9459258 12.1871987 8 17.9459258
#> 4 34 23590742 12.7847927 14.1619297 0 12.7847927
#> 5 34 23590742 12.7847927 14.1619297 0 12.7847927
#> 6 74 24169298 -1.3083581 3.7615436 0 -1.3083581
#> 7 93 24513169 3.5925344 14.2637039 21 3.5925344
#> 8 193 25080685 10.4812190 13.7290262 21 10.4812190
#> 9 230 25180726 2.8562726 -12.1389331 24 2.8562726
#> 10 264 25390275 10.2845473 24.3542357 21 10.2845473
#> 11 269 25423057 6.5545404 -3.9828030 0 6.5545404
#> 12 271 25430753 18.8560610 -14.3746068 15 18.8560610
#> 13 284 25533619 5.8973672 -1.0198711 0 5.8973672
#> 14 315 25657237 4.2153522 -10.1483175 0 4.2153522
#> 15 316 25664461 8.6975353 -18.9493868 6 8.6975353
#> 16 317 25664660 8.6594644 -18.9326656 6 8.6594644
#> 17 330 25729109 -8.6293281 9.2041832 0 -8.6293281
#> 18 336 25747566 -15.6200696 2.3829494 30 -15.6200696
#> 19 340 25756221 17.9352369 12.5265021 8 17.9352369
#> 20 350 25787904 -18.4901506 -9.6701763 0 -18.4901506
#> 21 375 25895907 -23.5319982 -3.1617573 4 -23.5319982
#> 22 391 25976024 8.5063506 16.3966099 21 8.5063506
#> 23 410 26058402 8.5536261 19.2940582 21 8.5536261
#> 24 449 26180952 3.9031487 3.5906941 23 3.9031487
#> 25 484 26310351 7.9053921 0.9970058 0 7.9053921
#> 26 501 26386548 1.3053627 16.3198468 0 1.3053627
#> 27 518 26443419 16.1132002 11.5440025 8 16.1132002
#> 28 530 26493266 17.9530703 12.8277777 8 17.9530703
#> 29 569 26597027 19.7411767 13.3930875 8 19.7411767
#> 30 576 26613831 0.5369444 -9.5206653 26 0.5369444
#> 31 616 26749911 13.4330696 1.7443322 10 13.4330696
#> 32 628 26797628 6.0443641 18.7808677 21 6.0443641
#> 33 678 26958160 -0.9113698 -6.1323074 0 -0.9113698
#> 34 701 27029875 19.4242220 -1.0230197 3 19.4242220
#> 35 727 27107447 -23.6767822 -1.5461911 4 -23.6767822
#> 36 734 27133768 0.2032795 -5.5649371 27 0.2032795
#> 37 816 27437065 -4.6193560 -3.8381548 0 -4.6193560
#> 38 839 27489028 -1.7836963 -13.0842509 0 -1.7836963
#> 39 848 27531941 6.2996925 -3.8482034 0 6.2996925
#> 40 919 27796839 -5.0326016 14.6370219 19 -5.0326016
#> 41 933 27830257 15.3363535 11.0338098 8 15.3363535
#> 42 944 27873357 14.9430598 10.7895758 8 14.9430598
#> 43 949 27897014 -4.9928722 19.6063307 0 -4.9928722
#> 44 965 27919863 5.7993303 3.9638841 0 5.7993303
#> 45 981 28000011 -5.2419535 -12.0828980 18 -5.2419535
#> 46 987 28038933 17.8208559 12.2645485 8 17.8208559
#> 47 1056 28268937 -11.1160009 8.2312798 0 -11.1160009
#> 48 1060 28276633 14.1001197 16.6310472 0 14.1001197
#> 49 1078 28348842 -1.4515173 -4.2104052 0 -1.4515173
#> 50 1079 28349220 -15.4160053 4.7389646 0 -15.4160053
#> 51 1080 28352101 -11.9913490 7.0312553 31 -11.9913490
#> 52 1112 28468831 1.1479994 -9.2940665 26 1.1479994
#> 53 1142 28578585 -14.8664316 18.0658715 0 -14.8664316
#> 54 1148 28595734 17.8965533 12.4750716 8 17.8965533
#> 55 1177 28679894 1.5182927 -8.4228412 26 1.5182927
#> 56 1211 28828569 -1.2473699 0.9229922 25 -1.2473699
#> 57 1216 28830109 16.7099179 10.8837881 8 16.7099179
#> 58 1232 28867810 5.1441174 12.4273432 21 5.1441174
#> 59 1243 28918390 -0.3964918 -3.1114973 27 -0.3964918
#> 60 1257 28968751 -1.1765904 -17.8680487 0 -1.1765904
#> 61 1259 28973597 -1.0552768 -1.5147174 0 -1.0552768
#> 62 1269 28985648 -0.4650058 5.1938524 22 -0.4650058
#> 63 1300 29069394 -3.3786228 -6.1332039 0 -3.3786228
#> 64 1309 29091181 -8.5848362 -26.2597285 2 -8.5848362
#> 65 1318 29108078 0.9644624 -6.9299882 26 0.9644624
#> 66 1351 29190281 10.2712410 20.1757438 21 10.2712410
#> 67 1355 29214005 -11.4767510 -15.4986852 0 -11.4767510
#> 68 1356 29214566 -5.3043272 11.3072299 19 -5.3043272
#> 69 1364 29221465 7.1550806 16.0276878 21 7.1550806
#> 70 1365 29221544 1.5024214 -13.0799031 24 1.5024214
#> 71 1383 29240341 14.9595587 9.7912510 8 14.9595587
#> 72 1390 29261408 8.5024426 13.1403855 21 8.5024426
#> 73 1403 29281906 -19.0439329 -0.7067931 0 -19.0439329
#> 74 1405 29287746 -13.2874009 -1.5060592 0 -13.2874009
#> 75 1409 29294362 9.1163057 -12.3162993 0 9.1163057
#> 76 1429 29335460 6.3071276 -15.4734004 0 6.3071276
#> 77 1434 29348445 0.5821453 -18.9138888 0 0.5821453
#> 78 1435 29348446 -2.2805947 -15.2919086 13 -2.2805947
#> 79 1436 29348447 -10.8598933 12.1998295 0 -10.8598933
#> 80 1437 29348449 -0.4504712 -22.0547287 11 -0.4504712
#> 81 1438 29348450 6.6379766 -15.5483583 0 6.6379766
#> 82 1439 29348451 0.4114133 -19.9572929 12 0.4114133
#> 83 1467 29427233 -13.6580699 -1.1899253 30 -13.6580699
#> 84 1474 29448009 -25.3902763 -2.4085651 4 -25.3902763
#> 85 1476 29449298 -4.9674317 8.9912397 0 -4.9674317
#> 86 1490 29489633 -7.2696453 -22.1501665 2 -7.2696453
#> 87 1501 29517020 16.2533409 -2.8685270 0 16.2533409
#> 88 1516 29547688 13.6398270 -12.1987474 15 13.6398270
#> 89 1524 29566116 27.9335530 1.2785310 3 27.9335530
#> 90 1526 29568086 -4.8529615 11.9998773 19 -4.8529615
#> 91 1555 29631232 -3.5799003 13.0977150 19 -3.5799003
#> 92 1557 29649525 -0.8322110 11.6141543 0 -0.8322110
#> 93 1596 29733705 16.6958552 -13.7227419 15 16.6958552
#> 94 1598 29737474 24.4784013 0.4252312 3 24.4784013
#> 95 1608 29768712 3.6279691 -14.2608588 24 3.6279691
#> 96 1628 29801952 9.5753299 21.4353375 21 9.5753299
#> 97 1665 29900518 10.6940256 20.0539956 21 10.6940256
#> 98 1673 29921346 -13.2662128 2.1722850 30 -13.2662128
#> 99 1681 29976205 -19.9096272 -6.1103093 0 -19.9096272
#> 100 1682 29980494 -22.0186313 4.2840866 20 -22.0186313
#> 101 1698 30015248 0.5048556 -20.0602806 0 0.5048556
#> 102 1698 30015248 0.5048556 -20.0602806 0 0.5048556
#> 103 1700 30019795 2.4321626 -7.8317835 0 2.4321626
#> 104 1701 30019964 -14.3080611 -11.4450530 0 -14.3080611
#> 105 1721 30064068 -2.8984793 -16.5945497 13 -2.8984793
#> 106 1727 30089004 -34.1989933 2.3096163 1 -34.1989933
#> 107 1728 30091838 0.1817575 -7.0123455 0 0.1817575
#> 108 1732 30116096 -0.9373005 1.4897928 25 -0.9373005
#> 109 1743 30134474 9.0577214 6.0619761 0 9.0577214
#> 110 1751 30146994 6.2772796 13.7510284 21 6.2772796
#> 111 1758 30157524 9.7269043 21.8091544 21 9.7269043
#> 112 1772 30192202 -1.7832015 -14.6834152 0 -1.7832015
#> 113 1777 30202055 -3.1672423 13.0863658 0 -3.1672423
#> 114 1784 30220531 -7.7806753 12.5539743 19 -7.7806753
#> 115 1789 30232241 -17.8160068 2.4330612 0 -17.8160068
#> 116 1804 30267935 13.7151910 2.1005754 10 13.7151910
#> 117 1810 30283727 -1.7774517 -11.2935062 0 -1.7774517
#> 118 1811 30283728 -2.2950370 -16.6502317 13 -2.2950370
#> 119 1815 30293300 19.7270055 13.3847235 8 19.7270055
#> 120 1829 30324465 27.2028492 2.2454084 3 27.2028492
#> 121 1830 30326457 -8.5202679 8.4572288 0 -8.5202679
#> 122 1839 30342281 -5.3061691 -11.1643613 18 -5.3061691
#> 123 1854 30377782 -4.3833362 -5.6289424 0 -4.3833362
#> 124 1858 30389972 8.7448753 -4.1489059 17 8.7448753
#> 125 1876 30451063 -5.7322991 0.4293188 30 -5.7322991
#> 126 1882 30462345 6.0973140 9.8947516 21 6.0973140
#> 127 1885 30472571 2.8656425 11.8929326 21 2.8656425
#> 128 1886 30473042 3.3004913 -7.7697583 0 3.3004913
#> 129 1889 30480896 3.6471501 -14.7363888 24 3.6471501
#> 130 1903 30530667 7.6029745 14.8776865 21 7.6029745
#> 131 1910 30544648 8.9213195 -5.8330066 17 8.9213195
#> 132 1912 30547396 13.1722228 1.8129995 10 13.1722228
#> 133 1914 30551689 -3.4316744 -1.2924932 30 -3.4316744
#> 134 1915 30552260 1.5590173 -7.7543023 26 1.5590173
#> 135 1926 30585297 5.2869092 -0.8867472 0 5.2869092
#> 136 1928 30594159 13.1298355 -12.3815756 15 13.1298355
#> 137 1948 30630441 -5.7227661 -2.9852598 0 -5.7227661
#> 138 1965 30667080 3.0580435 -2.8186220 0 3.0580435
#> 139 1966 30669721 -0.9695266 -19.5179670 12 -0.9695266
#> 140 1983 30703857 1.6007501 -6.8675244 0 1.6007501
#> 141 1983 30703857 1.6007501 -6.8675244 0 1.6007501
#> 142 1989 30727964 8.0339038 18.2378713 21 8.0339038
#> 143 1994 30745223 2.1478516 23.0172908 0 2.1478516
#> 144 1995 30746678 -4.5359099 14.8768468 19 -4.5359099
#> 145 1995 30746678 -4.5359099 14.8768468 19 -4.5359099
#> 146 1995 30746678 -4.5359099 14.8768468 19 -4.5359099
#> 147 1996 30747060 -4.9575794 -12.5417892 18 -4.9575794
#> 148 2010 30808574 -5.5977311 19.0849266 0 -5.5977311
#> 149 2014 30815151 14.0944370 -9.5550367 15 14.0944370
#> 150 2030 30864021 25.4417138 -0.1492616 3 25.4417138
#> 151 2042 30902508 10.8166798 -12.0718403 15 10.8166798
#> 152 2066 30973881 -8.9898124 -13.6966597 16 -8.9898124
#> 153 2072 30990472 10.3292165 20.0213295 21 10.3292165
#> 154 2073 30999846 22.4958054 -0.4844072 3 22.4958054
#> 155 2080 31043176 9.2185197 -6.0102941 17 9.2185197
#> 156 2082 31055701 1.9050152 2.8556660 23 1.9050152
#> 157 2092 31096368 -34.1989799 2.3096168 1 -34.1989799
#> 158 2093 31099672 5.3593085 -3.1133765 0 5.3593085
#> 159 2096 31107221 -5.3117313 -11.0002088 18 -5.3117313
#> 160 2097 31117831 10.2058499 15.8771467 0 10.2058499
#> 161 2100 31127115 -6.2148637 -14.4148720 0 -6.2148637
#> 162 2104 31149729 -0.5470288 -21.4062560 11 -0.5470288
#> 163 2107 31162343 -8.4970618 -24.4541188 2 -8.4970618
#> 164 2109 31200726 0.8347734 -6.9762980 0 0.8347734
#> 165 2124 31271966 -6.5002516 -10.1143325 18 -6.5002516
#> 166 2124 31271966 -6.5002516 -10.1143325 18 -6.5002516
#> 167 2126 31278189 2.6647358 -9.0318634 0 2.6647358
#> 168 2126 31278189 2.6647358 -9.0318634 0 2.6647358
#> 169 2129 31280419 25.8870399 6.9254184 0 25.8870399
#> 170 2132 31311528 12.4957199 -12.1981389 15 12.4957199
#> 171 2136 31323611 -34.1990033 2.3096414 1 -34.1990033
#> 172 2139 31332535 -5.2139483 13.0439445 19 -5.2139483
#> 173 2155 31395146 3.8720636 17.6469445 0 3.8720636
#> 174 2168 31438363 -0.8857242 -19.3992556 12 -0.8857242
#> 175 2185 31500215 5.8311808 12.9997847 21 5.8311808
#> 176 2209 31883553 7.0053058 -11.3773437 0 7.0053058
#> V3 clustered
#> 1 -19.7271766 clustered
#> 2 -19.7271766 clustered
#> 3 12.1871987 clustered
#> 4 14.1619297 not-clustered
#> 5 14.1619297 not-clustered
#> 6 3.7615436 not-clustered
#> 7 14.2637039 clustered
#> 8 13.7290262 clustered
#> 9 -12.1389331 clustered
#> 10 24.3542357 clustered
#> 11 -3.9828030 not-clustered
#> 12 -14.3746068 clustered
#> 13 -1.0198711 not-clustered
#> 14 -10.1483175 not-clustered
#> 15 -18.9493868 clustered
#> 16 -18.9326656 clustered
#> 17 9.2041832 not-clustered
#> 18 2.3829494 clustered
#> 19 12.5265021 clustered
#> 20 -9.6701763 not-clustered
#> 21 -3.1617573 clustered
#> 22 16.3966099 clustered
#> 23 19.2940582 clustered
#> 24 3.5906941 clustered
#> 25 0.9970058 not-clustered
#> 26 16.3198468 not-clustered
#> 27 11.5440025 clustered
#> 28 12.8277777 clustered
#> 29 13.3930875 clustered
#> 30 -9.5206653 clustered
#> 31 1.7443322 clustered
#> 32 18.7808677 clustered
#> 33 -6.1323074 not-clustered
#> 34 -1.0230197 clustered
#> 35 -1.5461911 clustered
#> 36 -5.5649371 clustered
#> 37 -3.8381548 not-clustered
#> 38 -13.0842509 not-clustered
#> 39 -3.8482034 not-clustered
#> 40 14.6370219 clustered
#> 41 11.0338098 clustered
#> 42 10.7895758 clustered
#> 43 19.6063307 not-clustered
#> 44 3.9638841 not-clustered
#> 45 -12.0828980 clustered
#> 46 12.2645485 clustered
#> 47 8.2312798 not-clustered
#> 48 16.6310472 not-clustered
#> 49 -4.2104052 not-clustered
#> 50 4.7389646 not-clustered
#> 51 7.0312553 clustered
#> 52 -9.2940665 clustered
#> 53 18.0658715 not-clustered
#> 54 12.4750716 clustered
#> 55 -8.4228412 clustered
#> 56 0.9229922 clustered
#> 57 10.8837881 clustered
#> 58 12.4273432 clustered
#> 59 -3.1114973 clustered
#> 60 -17.8680487 not-clustered
#> 61 -1.5147174 not-clustered
#> 62 5.1938524 clustered
#> 63 -6.1332039 not-clustered
#> 64 -26.2597285 clustered
#> 65 -6.9299882 clustered
#> 66 20.1757438 clustered
#> 67 -15.4986852 not-clustered
#> 68 11.3072299 clustered
#> 69 16.0276878 clustered
#> 70 -13.0799031 clustered
#> 71 9.7912510 clustered
#> 72 13.1403855 clustered
#> 73 -0.7067931 not-clustered
#> 74 -1.5060592 not-clustered
#> 75 -12.3162993 not-clustered
#> 76 -15.4734004 not-clustered
#> 77 -18.9138888 not-clustered
#> 78 -15.2919086 clustered
#> 79 12.1998295 not-clustered
#> 80 -22.0547287 clustered
#> 81 -15.5483583 not-clustered
#> 82 -19.9572929 clustered
#> 83 -1.1899253 clustered
#> 84 -2.4085651 clustered
#> 85 8.9912397 not-clustered
#> 86 -22.1501665 clustered
#> 87 -2.8685270 not-clustered
#> 88 -12.1987474 clustered
#> 89 1.2785310 clustered
#> 90 11.9998773 clustered
#> 91 13.0977150 clustered
#> 92 11.6141543 not-clustered
#> 93 -13.7227419 clustered
#> 94 0.4252312 clustered
#> 95 -14.2608588 clustered
#> 96 21.4353375 clustered
#> 97 20.0539956 clustered
#> 98 2.1722850 clustered
#> 99 -6.1103093 not-clustered
#> 100 4.2840866 clustered
#> 101 -20.0602806 not-clustered
#> 102 -20.0602806 not-clustered
#> 103 -7.8317835 not-clustered
#> 104 -11.4450530 not-clustered
#> 105 -16.5945497 clustered
#> 106 2.3096163 clustered
#> 107 -7.0123455 not-clustered
#> 108 1.4897928 clustered
#> 109 6.0619761 not-clustered
#> 110 13.7510284 clustered
#> 111 21.8091544 clustered
#> 112 -14.6834152 not-clustered
#> 113 13.0863658 not-clustered
#> 114 12.5539743 clustered
#> 115 2.4330612 not-clustered
#> 116 2.1005754 clustered
#> 117 -11.2935062 not-clustered
#> 118 -16.6502317 clustered
#> 119 13.3847235 clustered
#> 120 2.2454084 clustered
#> 121 8.4572288 not-clustered
#> 122 -11.1643613 clustered
#> 123 -5.6289424 not-clustered
#> 124 -4.1489059 clustered
#> 125 0.4293188 clustered
#> 126 9.8947516 clustered
#> 127 11.8929326 clustered
#> 128 -7.7697583 not-clustered
#> 129 -14.7363888 clustered
#> 130 14.8776865 clustered
#> 131 -5.8330066 clustered
#> 132 1.8129995 clustered
#> 133 -1.2924932 clustered
#> 134 -7.7543023 clustered
#> 135 -0.8867472 not-clustered
#> 136 -12.3815756 clustered
#> 137 -2.9852598 not-clustered
#> 138 -2.8186220 not-clustered
#> 139 -19.5179670 clustered
#> 140 -6.8675244 not-clustered
#> 141 -6.8675244 not-clustered
#> 142 18.2378713 clustered
#> 143 23.0172908 not-clustered
#> 144 14.8768468 clustered
#> 145 14.8768468 clustered
#> 146 14.8768468 clustered
#> 147 -12.5417892 clustered
#> 148 19.0849266 not-clustered
#> 149 -9.5550367 clustered
#> 150 -0.1492616 clustered
#> 151 -12.0718403 clustered
#> 152 -13.6966597 clustered
#> 153 20.0213295 clustered
#> 154 -0.4844072 clustered
#> 155 -6.0102941 clustered
#> 156 2.8556660 clustered
#> 157 2.3096168 clustered
#> 158 -3.1133765 not-clustered
#> 159 -11.0002088 clustered
#> 160 15.8771467 not-clustered
#> 161 -14.4148720 not-clustered
#> 162 -21.4062560 clustered
#> 163 -24.4541188 clustered
#> 164 -6.9762980 not-clustered
#> 165 -10.1143325 clustered
#> 166 -10.1143325 clustered
#> 167 -9.0318634 not-clustered
#> 168 -9.0318634 not-clustered
#> 169 6.9254184 not-clustered
#> 170 -12.1981389 clustered
#> 171 2.3096414 clustered
#> 172 13.0439445 clustered
#> 173 17.6469445 not-clustered
#> 174 -19.3992556 clustered
#> 175 12.9997847 clustered
#> 176 -11.3773437 not-clustered
#> clus_names
#> 1 scienc-opportun-scientif-research-studi-provid-develop-data
#> 2 scienc-opportun-scientif-research-studi-provid-develop-data
#> 3 epidemiologi-diseas-research-health-studi-data
#> 4 health-research-analysi-studi-base-data
#> 5 health-research-analysi-studi-base-data
#> 6 health-research-analysi-studi-base-data
#> 7 quot-research-health-inform-studi-data
#> 8 quot-research-health-inform-studi-data
#> 9 world-digit-health-patient-research-studi-develop-data
#> 10 quot-research-health-inform-studi-data
#> 11 health-research-analysi-studi-base-data
#> 12 medicin-precis-clinic-health-patient-develop-data
#> 13 health-research-analysi-studi-base-data
#> 14 health-research-analysi-studi-base-data
#> 15 ethic-opportun-challeng-clinic-research-data
#> 16 ethic-opportun-challeng-clinic-research-data
#> 17 health-research-analysi-studi-base-data
#> 18 method-analysi-base-result-studi-data
#> 19 epidemiologi-diseas-research-health-studi-data
#> 20 health-research-analysi-studi-base-data
#> 21 metric-method-provid-studi-base-data
#> 22 quot-research-health-inform-studi-data
#> 23 quot-research-health-inform-studi-data
#> 24 research-model-era-clinic-studi-base-data
#> 25 health-research-analysi-studi-base-data
#> 26 health-research-analysi-studi-base-data
#> 27 epidemiologi-diseas-research-health-studi-data
#> 28 epidemiologi-diseas-research-health-studi-data
#> 29 epidemiologi-diseas-research-health-studi-data
#> 30 health-system-outcom-research-develop-inform-data
#> 31 biologi-system-comput-analysi-approach-data
#> 32 quot-research-health-inform-studi-data
#> 33 health-research-analysi-studi-base-data
#> 34 cancer-patient-clinic-base-studi-data
#> 35 metric-method-provid-studi-base-data
#> 36 healthcar-health-research-challeng-inform-data
#> 37 health-research-analysi-studi-base-data
#> 38 health-research-analysi-studi-base-data
#> 39 health-research-analysi-studi-base-data
#> 40 peopl-health-time-evalu-research-studi-base-data
#> 41 epidemiologi-diseas-research-health-studi-data
#> 42 epidemiologi-diseas-research-health-studi-data
#> 43 health-research-analysi-studi-base-data
#> 44 health-research-analysi-studi-base-data
#> 45 googl-search-health-base-inform-data
#> 46 epidemiologi-diseas-research-health-studi-data
#> 47 health-research-analysi-studi-base-data
#> 48 health-research-analysi-studi-base-data
#> 49 health-research-analysi-studi-base-data
#> 50 health-research-analysi-studi-base-data
#> 51 model-perform-power-method-analysi-base-studi-data
#> 52 health-system-outcom-research-develop-inform-data
#> 53 health-research-analysi-studi-base-data
#> 54 epidemiologi-diseas-research-health-studi-data
#> 55 health-system-outcom-research-develop-inform-data
#> 56 method-advanc-promis-approach-clinic-base-develop-data
#> 57 epidemiologi-diseas-research-health-studi-data
#> 58 quot-research-health-inform-studi-data
#> 59 healthcar-health-research-challeng-inform-data
#> 60 health-research-analysi-studi-base-data
#> 61 health-research-analysi-studi-base-data
#> 62 extern-data-tension-landscap-8-facilit-medic-meet-ag-individu-strategi-result-health-analysi
#> 63 health-research-analysi-studi-base-data
#> 64 nurs-health-patient-care-research-data
#> 65 health-system-outcom-research-develop-inform-data
#> 66 quot-research-health-inform-studi-data
#> 67 health-research-analysi-studi-base-data
#> 68 peopl-health-time-evalu-research-studi-base-data
#> 69 quot-research-health-inform-studi-data
#> 70 world-digit-health-patient-research-studi-develop-data
#> 71 epidemiologi-diseas-research-health-studi-data
#> 72 quot-research-health-inform-studi-data
#> 73 health-research-analysi-studi-base-data
#> 74 health-research-analysi-studi-base-data
#> 75 health-research-analysi-studi-base-data
#> 76 health-research-analysi-studi-base-data
#> 77 health-research-analysi-studi-base-data
#> 78 scienc-analysi-system-process-method-health-studi-develop-data
#> 79 health-research-analysi-studi-base-data
#> 80 artifici-intellig-healthcar-scienc-health-data
#> 81 health-research-analysi-studi-base-data
#> 82 scienc-opportun-scientif-research-studi-provid-develop-data
#> 83 method-analysi-base-result-studi-data
#> 84 metric-method-provid-studi-base-data
#> 85 health-research-analysi-studi-base-data
#> 86 nurs-health-patient-care-research-data
#> 87 health-research-analysi-studi-base-data
#> 88 medicin-precis-clinic-health-patient-develop-data
#> 89 cancer-patient-clinic-base-studi-data
#> 90 peopl-health-time-evalu-research-studi-base-data
#> 91 peopl-health-time-evalu-research-studi-base-data
#> 92 health-research-analysi-studi-base-data
#> 93 medicin-precis-clinic-health-patient-develop-data
#> 94 cancer-patient-clinic-base-studi-data
#> 95 world-digit-health-patient-research-studi-develop-data
#> 96 quot-research-health-inform-studi-data
#> 97 quot-research-health-inform-studi-data
#> 98 method-analysi-base-result-studi-data
#> 99 health-research-analysi-studi-base-data
#> 100 dimension-model-method-base-result-data
#> 101 health-research-analysi-studi-base-data
#> 102 health-research-analysi-studi-base-data
#> 103 health-research-analysi-studi-base-data
#> 104 health-research-analysi-studi-base-data
#> 105 scienc-analysi-system-process-method-health-studi-develop-data
#> 106 na
#> 107 health-research-analysi-studi-base-data
#> 108 method-advanc-promis-approach-clinic-base-develop-data
#> 109 health-research-analysi-studi-base-data
#> 110 quot-research-health-inform-studi-data
#> 111 quot-research-health-inform-studi-data
#> 112 health-research-analysi-studi-base-data
#> 113 health-research-analysi-studi-base-data
#> 114 peopl-health-time-evalu-research-studi-base-data
#> 115 health-research-analysi-studi-base-data
#> 116 biologi-system-comput-analysi-approach-data
#> 117 health-research-analysi-studi-base-data
#> 118 scienc-analysi-system-process-method-health-studi-develop-data
#> 119 epidemiologi-diseas-research-health-studi-data
#> 120 cancer-patient-clinic-base-studi-data
#> 121 health-research-analysi-studi-base-data
#> 122 googl-search-health-base-inform-data
#> 123 health-research-analysi-studi-base-data
#> 124 ai-artifici-intellig-medic-technologi-health-clinic-data
#> 125 method-analysi-base-result-studi-data
#> 126 quot-research-health-inform-studi-data
#> 127 quot-research-health-inform-studi-data
#> 128 health-research-analysi-studi-base-data
#> 129 world-digit-health-patient-research-studi-develop-data
#> 130 quot-research-health-inform-studi-data
#> 131 ai-artifici-intellig-medic-technologi-health-clinic-data
#> 132 biologi-system-comput-analysi-approach-data
#> 133 method-analysi-base-result-studi-data
#> 134 health-system-outcom-research-develop-inform-data
#> 135 health-research-analysi-studi-base-data
#> 136 medicin-precis-clinic-health-patient-develop-data
#> 137 health-research-analysi-studi-base-data
#> 138 health-research-analysi-studi-base-data
#> 139 scienc-opportun-scientif-research-studi-provid-develop-data
#> 140 health-research-analysi-studi-base-data
#> 141 health-research-analysi-studi-base-data
#> 142 quot-research-health-inform-studi-data
#> 143 health-research-analysi-studi-base-data
#> 144 peopl-health-time-evalu-research-studi-base-data
#> 145 peopl-health-time-evalu-research-studi-base-data
#> 146 peopl-health-time-evalu-research-studi-base-data
#> 147 googl-search-health-base-inform-data
#> 148 health-research-analysi-studi-base-data
#> 149 medicin-precis-clinic-health-patient-develop-data
#> 150 cancer-patient-clinic-base-studi-data
#> 151 medicin-precis-clinic-health-patient-develop-data
#> 152 power-understand-challeng-develop-approach-inform-data
#> 153 quot-research-health-inform-studi-data
#> 154 cancer-patient-clinic-base-studi-data
#> 155 ai-artifici-intellig-medic-technologi-health-clinic-data
#> 156 research-model-era-clinic-studi-base-data
#> 157 na
#> 158 health-research-analysi-studi-base-data
#> 159 googl-search-health-base-inform-data
#> 160 health-research-analysi-studi-base-data
#> 161 health-research-analysi-studi-base-data
#> 162 artifici-intellig-healthcar-scienc-health-data
#> 163 nurs-health-patient-care-research-data
#> 164 health-research-analysi-studi-base-data
#> 165 googl-search-health-base-inform-data
#> 166 googl-search-health-base-inform-data
#> 167 health-research-analysi-studi-base-data
#> 168 health-research-analysi-studi-base-data
#> 169 health-research-analysi-studi-base-data
#> 170 medicin-precis-clinic-health-patient-develop-data
#> 171 na
#> 172 peopl-health-time-evalu-research-studi-base-data
#> 173 health-research-analysi-studi-base-data
#> 174 scienc-opportun-scientif-research-studi-provid-develop-data
#> 175 quot-research-health-inform-studi-data
#> 176 health-research-analysi-studi-base-data
#> title
#> 1 Rhode Island Department of Health Strategic Plan: 2004-2010. Goal 3: Public health data, science & information.
#> 2 Rhode Island Department of Health Strategic Plan: 2004-2010. Goal 3: Public health data, science & information.
#> 3 Transforming epidemiology for 21st century medicine and public health.
#> 4 Big bad data: law, public health, and biomedical databases.
#> 5 Big bad data: law, public health, and biomedical databases.
#> 6 Prevention and management of noncommunicable disease: the IOC Consensus Statement, Lausanne 2013.
#> 7 Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes.
#> 8 The roles of federal legislation and evolving health care systems in promoting medical-dental collaboration.
#> 9 Health privacy is difficult but not impossible in a post-HIPAA data-driven world.
#> 10 Using "big data" to optimize public health outreach: answering the call to action.
#> 11 A bright future: innovation transforming public health in Chicago.
#> 12 Medicine. Big data meets public health.
#> 13 Ethical issues in using Twitter for public health surveillance and research: developing a taxonomy of ethical concepts from the research literature.
#> 14 Converting Big Data into public health.
#> 15 Ethical challenges of big data in public health.
#> 16 Confronting the ethical challenges of big data in public health.
#> 17 Big data and public health: navigating privacy laws to maximize potential.
#> 18 Big data! Big deal?
#> 19 Commentary: Epidemiology in the era of big data.
#> 20 Public policy response, aging in place, and big data platforms: Creating an effective collaborative system to cope with aging of the population.
#> 21 A new source of data for public health surveillance: Facebook likes.
#> 22 Charting a future for epidemiologic training.
#> 23 Pre-Clinical Traumatic Brain Injury Common Data Elements: Toward a Common Language Across Laboratories.
#> 24 How Automation Can Help Alleviate the Budget Crunch in Public Health Research.
#> 25 [Google Flu Trends--the initial application of big data in public health].
#> 26 Implementation of a web based universal exchange and inference language for medicine: Sparse data, probabilities and inference in data mining of clinical data repositories.
#> 27 From Smallpox to Big Data: The Next 100 Years of Epidemiologic Methods.
#> 28 Epidemiology: Then and Now.
#> 29 [Risks of the use of big data in research in public health and epidemiology].
#> 30 [Big data in health in Spain: now is the time for a national strategy].
#> 31 [Big data in the promotion of public health].
#> 32 The Online Dissemination of Nature-Health Concepts: Lessons from Sentiment Analysis of Social Media Relating to "Nature-Deficit Disorder".
#> 33 OpenHealth Platform for Interactive Contextualization of Population Health Open Data.
#> 34 [The root of the deep and fast ongoing evolution of both structure and methodology of clinical research].
#> 35 Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories.
#> 36 Moving microbiota research toward establishing causal associations that represent viable targets for effective public health interventions.
#> 37 Feasibility of Population Health Analytics and Data Visualization for Decision Support in the Infectious Diseases Domain: A pilot study.
#> 38 Real-World Data: Policy Issues Regarding their Access and Use.
#> 39 Policy Surveillance: A Vital Public Health Practice Comes of Age.
#> 40 Real-time Medical Emergency Response System: Exploiting IoT and Big Data for Public Health.
#> 41 Public Health and Epidemiology Informatics.
#> 42 Use of big data for drug development and for public and personal health and care.
#> 43 THE TRAINING OF NEXT GENERATION DATA SCIENTISTS IN BIOMEDICINE.
#> 44 Applying Multiple Data Collection Tools to Quantify Human Papillomavirus Vaccine Communication on Twitter.
#> 45 Public health awareness of autoimmune diseases after the death of a celebrity.
#> 46 Applied epidemiology and public health: are we training the future generations appropriately?
#> 47 CSDC: a nationwide screening platform for stroke control and prevention in China.
#> 48 Public health and precision medicine share a goal.
#> 49 Big data or bust: realizing the microbial genomics revolution.
#> 50 Social Media Monitoring of Discrimination and HIV Testing in Brazil, 2014-2015.
#> 51 Big-data-driven modeling unveils country-wide drivers of endemic schistosomiasis.
#> 52 How to work with local communities to improve population health: big data and small data.
#> 53 High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data.
#> 54 Ethics, big data and computing in epidemiology and public health.
#> 55 The EVOTION Decision Support System: Utilizing It for Public Health Policy-Making in Hearing Loss.
#> 56 A review of recent advances in data analytics for post-operative patient deterioration detection.
#> 57 Mind the Scales: Harnessing Spatial Big Data for Infectious Disease Surveillance and Inference.
#> 58 Longitudinal Study-Based Dementia Prediction for Public Health.
#> 59 A glossary for big data in population and public health: discussion and commentary on terminology and research methods.
#> 60 A global perspective on evolving bioinformatics and data science training needs.
#> 61 Using Big Data to Reveal Chronic Respiratory Disease Mortality Patterns and Identify Potential Public Health Interventions.
#> 62 [Communication and Networking - Results of the Working Group 8 of the Forum Future Public Health, Berlin 2016].
#> 63 Exploring completeness in clinical data research networks with DQe-c.
#> 64 GeoMed 2017: deeper insight from big data and small areas.
#> 65 [Big Data and Public Health - Results of the Working Group 1 of the Forum Future Public Health, Berlin 2016].
#> 66 Internet-based biosurveillance methods for vector-borne diseases: Are they novel public health tools or just novelties?
#> 67 Semantics-Powered Healthcare Engineering and Data Analytics.
#> 68 Improved Diagnosis and Care for Rare Diseases through Implementation of Precision Public Health Framework.
#> 69 Yale school of public health symposium on lifetime exposures and human health: the exposome; summary and future reflections.
#> 70 Big Data Knowledge in Global Health Education.
#> 71 Finland establishing the internet of genomics and health data.
#> 72 Big Data in Public Health: Terminology, Machine Learning, and Privacy.
#> 73 CMS Data: One Road to Quality Data Analytics.
#> 74 The importance of data structure in statistical analysis of dendritic spine morphology.
#> 75 Real world big data for clinical research and drug development.
#> 76 God is watching: history in the age of near-infinite digital archives.
#> 77 ToxicDocs: a new resource for assessing the impact of corporate practices on health.
#> 78 ToxicDocs (www.ToxicDocs.org) goes live: A giant step toward leveling the playing field for efforts to combat toxic exposures.
#> 79 Browsing a corporation's mind.
#> 80 ToxicDocs and the fight against biased public health science worldwide.
#> 81 The value of not being lost in our digital world.
#> 82 ToxicDocs: using the US legal system to confront industries' systematic counterattacks against public health.
#> 83 An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years.
#> 84 OpenStreetMap data for alcohol research: Reliability assessment and quality indicators.
#> 85 Clinical validation of a public health policy-making platform for hearing loss (EVOTION): protocol for a big data study.
#> 86 Nursing Theory, Terminology, and Big Data: Data-Driven Discovery of Novel Patterns in Archival Randomized Clinical Trial Data.
#> 87 Infection forecasts powered by big data.
#> 88 Big Data Analytic, Big Step for Patient Management and Care in Puerto Rico.
#> 89 Global Cancer Clinical Trials-Cooperation Between Investigators in High-Income Countries and Low- and Middle-Income Countries.
#> 90 Quantifying the propagation of distress and mental disorders in social networks.
#> 91 How do people in different places experience different levels of air pollution? Using worldwide Chinese as a lens.
#> 92 Complex analyses on clinical information systems using restricted natural language querying to resolve time-event dependencies.
#> 93 Precision Medicine: From Science To Value.
#> 94 Incidence and risk factors for congestive heart failure in patients with early breast cancer who received anthracycline and/or trastuzumab: a big data analysis of the Korean Health Insurance Review and Assessment service database.
#> 95 Innovation at the Intersection of Clinical Trials and Real-World Data Science to Advance Patient Care.
#> 96 Expectations and boundaries for Big Data approaches in social medicine.
#> 97 Individuals on alert: digital epidemiology and the individualization of surveillance.
#> 98 Analysis of consumer food purchase data used for outbreak investigations, a review.
#> 99 Socioeconomic disparities in abdominal obesity over the life course in China.
#> 100 Data-Driven Clustering Reveals a Link Between Symptoms and Functional Brain Connectivity in Depression.
#> 101 Over and under-regulation in the Colorado Cannabis industry - A data-analytic perspective.
#> 102 Over and under-regulation in the Colorado Cannabis industry - A data-analytic perspective.
#> 103 Priorities to Overcome Barriers Impacting Data Science Application in Emergency Care Research.
#> 104 Algorithm for comorbidities, associations, length of stay and mortality (ACALM).
#> 105 A data science approach to predicting patient aggressive events in a psychiatric hospital.
#> 106 <NA>
#> 107 European perspectives on big data applied to health: The case of biobanks and human databases.
#> 108 Anti-Racism Methods for Big Data Research: Lessons Learned from the HIV Testing, Linkage, & Retention in Care (HIV TLR) Study.
#> 109 Mapping the Flow of Pediatric Trauma Patients Using Process Mining.
#> 110 Big Data and the Opioid Crisis: Balancing Patient Privacy with Public Health.
#> 111 Public and Population Health Informatics: The Bridging of Big Data to Benefit Communities.
#> 112 Big Data Based m-Health Application to Prevent Health Hazards: A Design Science Framework.
#> 113 Don't forget people in the use of big data for development.
#> 114 Dynamics of the HIV outbreak and response in Scott County, IN, USA, 2011-15: a modelling study.
#> 115 Orderliness predicts academic performance: behavioural analysis on campus lifestyle.
#> 116 Structural biology meets data science: does anything change?
#> 117 Aspects of Data Ethics in a Changing World: Where Are We Now?
#> 118 Data-Driven Investment Strategies for Peer-to-Peer Lending: A Case Study for Teaching Data Science.
#> 119 [Spatial epidemiology plays an important role in control and prevention of diseases].
#> 120 ASO Author Reflections: Enabling Optimised Delivery of Patient-Centred Cancer Care Using Artificial Intelligence and Data Analytics.
#> 121 Data-driven competitive facilitative tree interactions and their implications on nature-based solutions.
#> 122 Monitoring public interest toward pertussis outbreaks: an extensive Google Trends-based analysis.
#> 123 Comparison of data science workflows for root cause analysis of bioprocesses.
#> 124 Data-driven Classification of the 3D Spinal Curve in Adolescent Idiopathic Scoliosis with an Applications in Surgical Outcome Prediction.
#> 125 Data Analytics and Modeling for Appointment No-show in Community Health Centers.
#> 126 Toward a National Conversation on Health: Disruptive Intervention and the Transformation from Health Care to Health.
#> 127 Using a data science approach to predict cocaine use frequency from depressive symptoms.
#> 128 Learning health systems.
#> 129 Beyond the EHR money pit: After investing big in health records, systems still face growing IT needs for upgrades, analytics and patient engagement.
#> 130 Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy.
#> 131 Artificial Intelligence and Big Data in Public Health.
#> 132 Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data.
#> 133 An Approach Towards Reducing Road Traffic Injuries and Improving Public Health Through Big Data Telematics: A Randomised Controlled Trial Protocol.
#> 134 Comprehensive scoping review of health research using social media data.
#> 135 A comparison of information sharing behaviours across 379 health conditions on Twitter.
#> 136 Big data hurdles in precision medicine and precision public health.
#> 137 INTERACT: A comprehensive approach to assess urban form interventions through natural experiments.
#> 138 Adjusting the focus: A public health ethics approach to data research.
#> 139 [Data science in large cohort studies].
#> 140 Population data science: advancing the safe use of population data for public benefit.
#> 141 Population data science: advancing the safe use of population data for public benefit.
#> 142 Precision public health to inhibit the contagion of disease and move toward a future in which microbes spread health.
#> 143 Pitfalls in big data analysis: next-generation technologies, last-generation data.
#> 144 Public health issues in the 21st century: National challenges and shared challenges for the Maghreb countries.
#> 145 Public health issues in the 21st century: National challenges and shared challenges for the Maghreb countries.
#> 146 Public health issues in the 21st century: National challenges and shared challenges for the Maghreb countries.
#> 147 Nutritional Culturomics and Big Data: Macroscopic Patterns of Change in Food, Nutrition and Diet Choices.
#> 148 Exploration, Inference, and Prediction in Neuroscience and Biomedicine.
#> 149 Developing a Clinico-Molecular Test for Individualized Treatment of Ovarian Cancer: The interplay of Precision Medicine Informatics with Clinical and Health Economics Dimensions.
#> 150 Applying Data Science methods and tools to unveil healthcare use of lung cancer patients in a teaching hospital in Spain.
#> 151 [Analysis of the discrimination of the final marks after the first computerized national ranking exam in Medicine in June 2016 in France].
#> 152 Reproducible big data science: A case study in continuous FAIRness.
#> 153 Big Data in occupational medicine: the convergence of -omics sciences, participatory research and e-health.
#> 154 VIGLA-M: visual gene expression data analytics.
#> 155 An overview of GeoAI applications in health and healthcare.
#> 156 Severe Maternal Morbidity, A Tale of 2 States Using Data for Action-Ohio and Massachusetts.
#> 157 <NA>
#> 158 Using Big Data and Predictive Analytics to Determine Patient Risk in Oncology.
#> 159 Using Big Data to Monitor the Introduction and Spread of Chikungunya, Europe, 2017.
#> 160 Intelligent health data analytics: A convergence of artificial intelligence and big data.
#> 161 Uncovering the structure of self-regulation through data-driven ontology discovery.
#> 162 Artificial intelligence in dermato-oncology: a joint clinical and data science perspective.
#> 163 Ripe for Disruption? Adopting Nurse-Led Data Science and Artificial Intelligence to Predict and Reduce Hospital-Acquired Outcomes in the Learning Health System.
#> 164 Impact of using a broad-based multi-institutional approach to build capacity for non-communicable disease research in Thailand.
#> 165 An investigation of the features facilitating effective collaboration between public health experts and data scientists at a hackathon.
#> 166 An investigation of the features facilitating effective collaboration between public health experts and data scientists at a hackathon.
#> 167 Applying Data Analytics to Address Social Determinants of Health in Practice.
#> 168 Applying Data Analytics to Address Social Determinants of Health in Practice.
#> 169 Editorial.
#> 170 Why we need a small data paradigm.
#> 171 <NA>
#> 172 Decision-Making based on Big Data Analytics for People Management in Healthcare Organizations.
#> 173 How Big Data Science Can Improve Linkage and Retention in Care.
#> 174 Informatics and Data Science for the Precision in Symptom Self-Management Center.
#> 175 Acute Health Impacts of the Southeast Asian Transboundary Haze Problem-A Review.
#> 176 Pharmacoepidemiology and Big Data Analytics: Challenges and Opportunities when Moving towards Precision Medicine.
#> abstract
#> 1
#> 2
#> 3 In 2012, the National Cancer Institute (NCI) engaged the scientific community to provide a vision for cancer epidemiology in the 21st century. Eight overarching thematic recommendations, with proposed corresponding actions for consideration by funding agencies, professional societies, and the research community emerged from the collective intellectual discourse. The themes are (i) extending the reach of epidemiology beyond discovery and etiologic research to include multilevel analysis, intervention evaluation, implementation, and outcomes research; (ii) transforming the practice of epidemiology by moving toward more access and sharing of protocols, data, metadata, and specimens to foster collaboration, to ensure reproducibility and replication, and accelerate translation; (iii) expanding cohort studies to collect exposure, clinical, and other information across the life course and examining multiple health-related endpoints; (iv) developing and validating reliable methods and technologies to quantify exposures and outcomes on a massive scale, and to assess concomitantly the role of multiple factors in complex diseases; (v) integrating "big data" science into the practice of epidemiology; (vi) expanding knowledge integration to drive research, policy, and practice; (vii) transforming training of 21st century epidemiologists to address interdisciplinary and translational research; and (viii) optimizing the use of resources and infrastructure for epidemiologic studies. These recommendations can transform cancer epidemiology and the field of epidemiology, in general, by enhancing transparency, interdisciplinary collaboration, and strategic applications of new technologies. They should lay a strong scientific foundation for accelerated translation of scientific discoveries into individual and population health benefits.
#> 4 The accelerating adoption of electronic health record (EHR) systems will have far-reaching implications for public health research and surveillance, which in turn could lead to changes in public policy, statutes, and regulations. The public health benefits of EHR use can be significant. However, researchers and analysts who rely on EHR data must proceed with caution and understand the potential limitations of EHRs. Because of clinicians' workloads, poor user-interface design, and other factors, EHR data can be erroneous, miscoded, fragmented, and incomplete. In addition, public health findings can be tainted by the problems of selection bias, confounding bias, and measurement bias. These flaws may become all the more troubling and important in an era of electronic "big data," in which a massive amount of information is processed automatically, without human checks. Thus, we conclude the paper by outlining several regulatory and other interventions to address data analysis difficulties that could result in invalid conclusions and unsound public health policies.
#> 5 The accelerating adoption of electronic health record (EHR) systems will have far-reaching implications for public health research and surveillance, which in turn could lead to changes in public policy, statutes, and regulations. The public health benefits of EHR use can be significant. However, researchers and analysts who rely on EHR data must proceed with caution and understand the potential limitations of EHRs. Because of clinicians' workloads, poor user-interface design, and other factors, EHR data can be erroneous, miscoded, fragmented, and incomplete. In addition, public health findings can be tainted by the problems of selection bias, confounding bias, and measurement bias. These flaws may become all the more troubling and important in an era of electronic "big data," in which a massive amount of information is processed automatically, without human checks. Thus, we conclude the paper by outlining several regulatory and other interventions to address data analysis difficulties that could result in invalid conclusions and unsound public health policies.
#> 6 Morbidity and mortality from preventable, noncommunicable chronic disease (NCD) threatens the health of our populations and our economies. The accumulation of vast amounts of scientific knowledge has done little to change this. New and innovative thinking is essential to foster new creative approaches that leverage and integrate evidence through the support of big data, technology, and design thinking. The purpose of this paper is to summarize the results of a consensus meeting on NCD prevention sponsored by the International Olympic Committee (IOC) in April 2013. Within the context of advocacy for multifaceted systems change, the IOC's focus is to create solutions that gain traction within health care systems. The group of participants attending the meeting achieved consensus on a strategy for the prevention and management of chronic disease that includes the following: 1. Focus on behavioral change as the core component of all clinical programs for the prevention and management of chronic disease. 2. Establish actual centers to design, implement, study, and improve preventive programs for chronic disease. 3. Use human-centered design (HCD) in the creation of prevention programs with an inclination to action, rapid prototyping and multiple iterations. 4. Extend the knowledge and skills of Sports and Exercise Medicine (SEM) professionals to build new programs for the prevention and treatment of chronic disease focused on physical activity, diet, and lifestyle. 5. Mobilize resources and leverage networks to scale and distribute programs of prevention. True innovation lies in the ability to align thinking around these core strategies to ensure successful implementation of NCD prevention and management programs within health care. The IOC and SEM community are in an ideal position to lead this disruptive change. The outcome of the consensus meeting was the creation of the IOC Non-Communicable Diseases ad hoc Working Group charged with the responsibility of moving this agenda forward.
#> 7 OBJECTIVE: Recent availability of "big data" might be used to study whether and how sexual risk behaviors are communicated on real-time social networking sites and how data might inform HIV prevention and detection. This study seeks to establish methods of using real-time social networking data for HIV prevention by assessing 1) whether geolocated conversations about HIV risk behaviors can be extracted from social networking data, 2) the prevalence and content of these conversations, and 3) the feasibility of using HIV risk-related real-time social media conversations as a method to detect HIV outcomes.METHODS: In 2012, tweets (N=553,186,061) were collected online and filtered to include those with HIV risk-related keywords (e.g., sexual behaviors and drug use). Data were merged with AIDSVU data on HIV cases. Negative binomial regressions assessed the relationship between HIV risk tweeting and prevalence by county, controlling for socioeconomic status measures.RESULTS: Over 9800 geolocated tweets were extracted and used to create a map displaying the geographical location of HIV-related tweets. There was a significant positive relationship (p<.01) between HIV-related tweets and HIV cases.CONCLUSION: Results suggest the feasibility of using social networking data as a method for evaluating and detecting Human immunodeficiency virus (HIV) risk behaviors and outcomes.
#> 8 Recent federal health care legislation contains explicit and implicit drivers for medical-dental collaboration. These laws implicitly promote health care evolution through value-based financing, "big data" and health information technology, increased number of care providers and a more holistic approach. Additional changes--practice aggregation, consumerism and population health perspectives--may also influence dental care. While dentistry will likely lag behind medicine toward value-based and accountable care organizations, dentists will be affected by changing consumer expectations.
#> 9 In the 13 years since their promulgation, the Health Insurance Portability and Accountability Act (HIPAA) rules and their enforcement have shown considerable evolution, as has the context within which they operate. Increasingly, it is the health information circulating outside the HIPAA-protected zone that is concerning: big data based on HIPAA data that have been acquired by public health agencies and then sold; medically inflected data collected from transactions or social media interactions; and the health data curated by patients, such as personal health records or data stored on smartphones. HIPAA does little here, suggesting that the future of health privacy may well be at the state level unless technology or federal legislation can catch up with state-of-the-art privacy regimes, such as the latest proposals from the European Commission.
#> 10
#> 11 Big cities continue to be centers for innovative solutions and services. Governments are quickly identifying opportunities to take advantage of this energy and revolutionize the means by which they deliver services to the public. The governmental public health sector is rapidly evolving in this respect, and Chicago is an emerging example of some of the changes to come. Governments are gradually adopting innovative informatics and big data tools and strategies, led by pioneering jurisdictions that are piecing together the standards, policy frameworks, and leadership structures fundamental to effective analytics use. They give an enticing glimpse of the technology's potential and a sense of the challenges that stand in the way. This is a rapidly evolving environment, and cities can work with partners to capitalize on the innovative energies of civic tech communities, health care systems, and emerging markets to introduce new methods to solve old problems.
#> 12
#> 13 BACKGROUND: The rise of social media and microblogging platforms in recent years, in conjunction with the development of techniques for the processing and analysis of "big data", has provided significant opportunities for public health surveillance using user-generated content. However, relatively little attention has been focused on developing ethically appropriate approaches to working with these new data sources.OBJECTIVE: Based on a review of the literature, this study seeks to develop a taxonomy of public health surveillance-related ethical concepts that emerge when using Twitter data, with a view to: (1) explicitly identifying a set of potential ethical issues and concerns that may arise when researchers work with Twitter data, and (2) providing a starting point for the formation of a set of best practices for public health surveillance through the development of an empirically derived taxonomy of ethical concepts.METHODS: We searched Medline, Compendex, PsycINFO, and the Philosopher's Index using a set of keywords selected to identify Twitter-related research papers that reference ethical concepts. Our initial set of queries identified 342 references across the four bibliographic databases. We screened titles and abstracts of these references using our inclusion/exclusion criteria, eliminating duplicates and unavailable papers, until 49 references remained. We then read the full text of these 49 articles and discarded 36, resulting in a final inclusion set of 13 articles. Ethical concepts were then identified in each of these 13 articles. Finally, based on a close reading of the text, a taxonomy of ethical concepts was constructed based on ethical concepts discovered in the papers.RESULTS: From these 13 articles, we iteratively generated a taxonomy of ethical concepts consisting of 10 top level categories: privacy, informed consent, ethical theory, institutional review board (IRB)/regulation, traditional research vs Twitter research, geographical information, researcher lurking, economic value of personal information, medical exceptionalism, and benefit of identifying socially harmful medical conditions.CONCLUSIONS: In summary, based on a review of the literature, we present a provisional taxonomy of public health surveillance-related ethical concepts that emerge when using Twitter data.
#> 14
#> 15
#> 16
#> 17
#> 18
#> 19 Big Data has increasingly been promoted as a revolutionary development in the future of science, including epidemiology. However, the definition and implications of Big Data for epidemiology remain unclear. We here provide a working definition of Big Data predicated on the so-called "three V's": variety, volume, and velocity. From this definition, we argue that Big Data has evolutionary and revolutionary implications for identifying and intervening on the determinants of population health. We suggest that as more sources of diverse data become publicly available, the ability to combine and refine these data to yield valid answers to epidemiologic questions will be invaluable. We conclude that while epidemiology as practiced today will continue to be practiced in the Big Data future, a component of our field's future value lies in integrating subject matter knowledge with increased technical savvy. Our training programs and our visions for future public health interventions should reflect this future.
#> 20 The unprecedented rapid aging of the population is poised to become the next global public health challenge, as is apparent by the fact that 23.1% of the total global burden of disease is attributable to disorders in people aged 60 years and older. Aging of the population is the biggest driver of substantial increases in the prevalence of chronic conditions, and the prevalence of multi-morbidity is much higher in older age groups. This places a large burden on countries' health and long-term care systems. Many behavioral changes and public policy responses to aging of the population have been implemented to cope with these challenges. A system of "aging in place" has been implemented in some high-income countries in order to better provide coordinated and cost-effective health services for the elderly. This approach reduces institutional care while supporting home- or community-based care and other services. Advances in information and communications technology (ICT), assistive devices, medical diagnostics, and interventions offer many ways of more efficiently providing long-term care as part of aging in place. The use of big data on a web services platform in an effective collaborative system should promote systematic data gathering to integrate clinical and public health information systems to provide support across the continuum of care. However, the use of big data in collaborative system is a double-edged sword, as it also bring challenges for information sharing, standardized data gathering, and the security of personal information, that warrant full attention.
#> 21 BACKGROUND: Investigation into personal health has become focused on conditions at an increasingly local level, while response rates have declined and complicated the process of collecting data at an individual level. Simultaneously, social media data have exploded in availability and have been shown to correlate with the prevalence of certain health conditions.OBJECTIVE: Facebook likes may be a source of digital data that can complement traditional public health surveillance systems and provide data at a local level. We explored the use of Facebook likes as potential predictors of health outcomes and their behavioral determinants.METHODS: We performed principal components and regression analyses to examine the predictive qualities of Facebook likes with regard to mortality, diseases, and lifestyle behaviors in 214 counties across the United States and 61 of 67 counties in Florida. These results were compared with those obtainable from a demographic model. Health data were obtained from both the 2010 and 2011 Behavioral Risk Factor Surveillance System (BRFSS) and mortality data were obtained from the National Vital Statistics System.RESULTS: Facebook likes added significant value in predicting most examined health outcomes and behaviors even when controlling for age, race, and socioeconomic status, with model fit improvements (adjusted R(2)) of an average of 58% across models for 13 different health-related metrics over basic sociodemographic models. Small area data were not available in sufficient abundance to test the accuracy of the model in estimating health conditions in less populated markets, but initial analysis using data from Florida showed a strong model fit for obesity data (adjusted R(2)=.77).CONCLUSIONS: Facebook likes provide estimates for examined health outcomes and health behaviors that are comparable to those obtained from the BRFSS. Online sources may provide more reliable, timely, and cost-effective county-level data than that obtainable from traditional public health surveillance systems as well as serve as an adjunct to those systems.
#> 22 PURPOSE: To identify macro-level trends that are changing the needs of epidemiologic research and practice and to develop and disseminate a set of competencies and recommendations for epidemiologic training that will be responsive to these changing needs.METHODS: There were three stages to the project: (1) assembling of a working group of senior epidemiologists from multiple sectors, (2) identifying relevant literature, and (3) conducting key informant interviews with 15 experienced epidemiologists.RESULTS: Twelve macro trends were identified along with associated actions for the field and educational competencies. The macro trends include the following: (1) "Big Data" or informatics, (2) the changing health communication environment, (3) the Affordable Care Act or health care system reform, (4) shifting demographics, (5) globalization, (6) emerging high-throughput technologies (omics), (7) a greater focus on accountability, (8) privacy changes, (9) a greater focus on "upstream" causes of disease, (10) the emergence of translational sciences, (11) the growing centrality of team and transdisciplinary science, and (12) the evolving funding environment.CONCLUSIONS: Addressing these issues through curricular change is needed to allow the field of epidemiology to more fully reach and sustain its full potential to benefit population health and remain a scientific discipline that makes critical contributions toward ensuring clinical, social, and population health.
#> 23 Traumatic brain injury (TBI) is a major public health issue exacting a substantial personal and economic burden globally. With the advent of "big data" approaches to understanding complex systems, there is the potential to greatly accelerate knowledge about mechanisms of injury and how to detect and modify them to improve patient outcomes. High quality, well-defined data are critical to the success of bioinformatics platforms, and a data dictionary of "common data elements" (CDEs), as well as "unique data elements" has been created for clinical TBI research. There is no data dictionary, however, for preclinical TBI research despite similar opportunities to accelerate knowledge. To address this gap, a committee of experts was tasked with creating a defined set of data elements to further collaboration across laboratories and enable the merging of data for meta-analysis. The CDEs were subdivided into a Core module for data elements relevant to most, if not all, studies, and Injury-Model-Specific modules for non-generalizable data elements. The purpose of this article is to provide both an overview of TBI models and the CDEs pertinent to these models to facilitate a common language for preclinical TBI research.
#> 24 In an era of severe funding constraints for public health research, more efficient means of conducting research will be needed if scientific progress is to continue. At present major funders, such as the National Institutes of Health, do not provide specific instructions to grant authors or to reviewers regarding the cost efficiency of the research that they conduct. Doing so could potentially allow more research to be funded within current budgetary constraints and reduce waste. I describe how a blinded randomized trial was conducted for $ 275,000 by completely automating the consent and data collection processes. The study used the participants' own computer equipment, relied on big data for outcomes, and outsourced some costly tasks, potentially saving $1 million in research costs.
#> 25 Google Flu Trends (GFT) was the first application of big data in the public health field. GFT was open online in 2009 and attracted worldwide attention immediately. However, GFT failed catching the 2009 pandemic H1N1 and kept overestimating the intensity of influenza-like illness in the 2012-2014 season in the United States. GFT model has been updated for three times since 2009, making its prediction bias controlled. Here, we summarized the mechanism GFT worked, the strategy GFT used to update, and its influence on public health.
#> 26 We extend Q-UEL, our universal exchange language for interoperability and inference in healthcare and biomedicine, to the more traditional fields of public health surveys. These are the type associated with screening, epidemiological and cross-sectional studies, and cohort studies in some cases similar to clinical trials. There is the challenge that there is some degree of split between frequentist notions of probability as (a) classical measures based only on the idea of counting and proportion and on classical biostatistics as used in the above conservative disciplines, and (b) more subjectivist notions of uncertainty, belief, reliability, or confidence often used in automated inference and decision support systems. Samples in the above kind of public health survey are typically small compared with our earlier "Big Data" mining efforts. An issue addressed here is how much impact on decisions should sparse data have. We describe a new Q-UEL compatible toolkit including a data analytics application DiracMiner that also delivers more standard biostatistical results, DiracBuilder that uses its output to build Hyperbolic Dirac Nets (HDN) for decision support, and HDNcoherer that ensures that probabilities are mutually consistent. Use is exemplified by participating in a real word health-screening project, and also by deployment in a industrial platform called the BioIngine, a cognitive computing platform for health management.
#> 27 For more than a century, epidemiology has seen major shifts in both focus and methodology. Taking into consideration the explosion of "big data," the advent of more sophisticated data collection and analytical tools, and the increased interest in evidence-based solutions, we present a framework that summarizes 3 fundamental domains of epidemiologic methods that are relevant for the understanding of both historical contributions and future directions in public health. First, the manner in which populations and their follow-up are defined is expanding, with greater interest in online populations whose definition does not fit the usual classification by person, place, and time. Second, traditional data collection methods, such as population-based surveillance and individual interviews, have been supplemented with advances in measurement. From biomarkers to mobile health, innovations in the measurement of exposures and diseases enable refined accuracy of data collection. Lastly, the comparison of populations is at the heart of epidemiologic methodology. Risk factor epidemiology, prediction methods, and causal inference strategies are areas in which the field is continuing to make significant contributions to public health. The framework presented herein articulates the multifaceted ways in which epidemiologic methods make such contributions and can continue to do so as we embark upon the next 100 years.
#> 28 Twenty-five years ago, on the 75th anniversary of the Johns Hopkins Bloomberg School of Public Health, I noted that epidemiologic research was moving away from the traditional approaches used to investigate "epidemics" and their close relationship with preventive medicine. Twenty-five years later, the role of epidemiology as an important contribution to human population research, preventive medicine, and public health is under substantial pressure because of the emphasis on "big data," phenomenology, and personalized medical therapies. Epidemiology is the study of epidemics. The primary role of epidemiology is to identify the epidemics and parameters of interest of host, agent, and environment and to generate and test hypotheses in search of causal pathways. Almost all diseases have a specific distribution in relation to time, place, and person and specific "causes" with high effect sizes. Epidemiology then uses such information to develop interventions and test (through clinical trials and natural experiments) their efficacy and effectiveness. Epidemiology is dependent on new technologies to evaluate improved measurements of host (genomics), epigenetics, identification of agents (metabolomics, proteomics), new technology to evaluate both physical and social environment, and modern methods of data collection. Epidemiology does poorly in studying anything other than epidemics and collections of numerators and denominators without specific hypotheses even with improved statistical methodologies.
#> 29
#> 30
#> 31 Big data (very large data sets) are increasing in an accelerating speed. More and more data is also becoming freely available. This article is an overview of this progress and data sources related to molecular biology and public health especially from the Finnish perspective. Finland has several excellent data sources that are currently not used effectively. Big data has already produced major benefits especially in molecular biology, but benefits in public health and individual choice are only now being materialised. The paradigm in research may change dramatically, if the effort switches from article production to the production of knowledge crystals, i.e. collaborative data-based answers to research questions. Also the role of a clinician is becoming more like that of a coach.
#> 32 Evidence continues to grow supporting the idea that restorative environments, green exercise, and nature-based activities positively impact human health. Nature-deficit disorder, a journalistic term proposed to describe the ill effects of people's alienation from nature, is not yet formally recognized as a medical diagnosis. However, over the past decade, the phrase has been enthusiastically taken up by some segments of the lay public. Social media, such as Twitter, with its opportunities to gather "big data" related to public opinions, offers a medium for exploring the discourse and dissemination around nature-deficit disorder and other nature-health concepts. In this paper, we report our experience of collecting more than 175,000 tweets, applying sentiment analysis to measure positive, neutral or negative feelings, and preliminarily mapping the impact on dissemination. Sentiment analysis is currently used to investigate the repercussions of events in social networks, scrutinize opinions about products and services, and understand various aspects of the communication in Web-based communities. Based on a comparison of nature-deficit-disorder "hashtags" and more generic nature hashtags, we make recommendations for the better dissemination of public health messages through changes to the framing of messages. We show the potential of Twitter to aid in better understanding the impact of the natural environment on human health and wellbeing.
#> 33 The financial incentives for data science applications leading to improved health outcomes, such as DSRIP (bit.ly/dsrip), are well-aligned with the broad adoption of Open Data by State and Federal agencies. This creates entirely novel opportunities for analytical applications that make exclusive use of the pervasive Web Computing platform. The framework described here explores this new avenue to contextualize Health data in a manner that relies exclusively on the native JavaScript interpreter and data processing resources of the ubiquitous Web Browser. The OpenHealth platform is made publicly available, and is publicly hosted with version control and open source, at https://github.com/mathbiol/openHealth. The different data/analytics workflow architectures explored are accompanied with live applications ranging from DSRIP, such as Hospital Inpatient Prevention Quality Indicators at http://bit.ly/pqiSuffolk, to The Cancer Genome Atlas (TCGA) as illustrated by http://bit.ly/tcgascopeGBM.
#> 34 The growing scientific knowledge and technology development are leading to radical changes in biological and medical research. The prevalent lines of development deal with a pragmatic evolution of controlled clinical trials, a massive diffusion of observational research, which is progressively incorporated in clinical practice, new models and designs of clinical research, the systematic use of information technology to build up vast networks of medical centers producing huge amounts of shared data to be managed through the big data methodology, personalized as well as precision medicine, a reshaped physician-patient relationship based on a co-working principle. All this is leading to profound changes in public health governance, a renewal of clinical epidemiology and prevention, a modified structure of several specific sectors of medical care, hopefully guided by scientific evidences. A few aspects of such an evolving picture are discussed in this article.
#> 35 To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ).</AbstractText>: To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ).Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data.</AbstractText>: Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data.The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices.</AbstractText>: The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices.Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed.</AbstractText>: Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed.Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures.</AbstractText>: Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures.
#> 36 There appears to be great promise in developing targeted interventions that exploit the complex interactions between commensal microbiota colonizing various anatomic sites and the health of the human host. Although data about variations in microbiota composition across various population groups are accumulating, there remains a limited understanding of the driving forces behind the observed intraindividual and interindividual differences in microbiota.</AbstractText>: There appears to be great promise in developing targeted interventions that exploit the complex interactions between commensal microbiota colonizing various anatomic sites and the health of the human host. Although data about variations in microbiota composition across various population groups are accumulating, there remains a limited understanding of the driving forces behind the observed intraindividual and interindividual differences in microbiota.Using information derived from research on gut microbiota, the anatomic site that harbors the highest numbers of bacteria and associated microbial functions, we emphasize the need for establishing causality of observed correlations. Progress to date in establishing microbiota-based targets for novel prevention and treatment approaches is reviewed, suggesting avenues for future research endeavors.</AbstractText>: Using information derived from research on gut microbiota, the anatomic site that harbors the highest numbers of bacteria and associated microbial functions, we emphasize the need for establishing causality of observed correlations. Progress to date in establishing microbiota-based targets for novel prevention and treatment approaches is reviewed, suggesting avenues for future research endeavors.The complexities associated with the diverse interactions among environmental exposures, microbiota composition and activities and health of the human host require multidisciplinary study approaches to move knowledge beyond current descriptions of correlations without considerations of temporality and mechanisms.</AbstractText>: The complexities associated with the diverse interactions among environmental exposures, microbiota composition and activities and health of the human host require multidisciplinary study approaches to move knowledge beyond current descriptions of correlations without considerations of temporality and mechanisms.Information gleaned from studies of the potential contributions of diet-mediated microbiota effects on colorectal carcinogenesis will be useful in designing future large population-based studies for determining the causal pathways for microbiota contributions to human health and disease.</AbstractText>: Information gleaned from studies of the potential contributions of diet-mediated microbiota effects on colorectal carcinogenesis will be useful in designing future large population-based studies for determining the causal pathways for microbiota contributions to human health and disease.
#> 37 Big data or population-based information has the potential to reduce uncertainty in medicine by informing clinicians about individual patient care. The objectives of this study were: 1) to explore the feasibility of extracting and displaying population-based information from an actual clinical population's database records, 2) to explore specific design features for improving population display, 3) to explore perceptions of population information displays, and 4) to explore the impact of population information display on cognitive outcomes.</AbstractText>: Big data or population-based information has the potential to reduce uncertainty in medicine by informing clinicians about individual patient care. The objectives of this study were: 1) to explore the feasibility of extracting and displaying population-based information from an actual clinical population's database records, 2) to explore specific design features for improving population display, 3) to explore perceptions of population information displays, and 4) to explore the impact of population information display on cognitive outcomes.We used the Veteran's Affairs (VA) database to identify similar complex patients based on a similar complex patient case. Study outcomes measures were 1) preferences for population information display 2) time looking at the population display, 3) time to read the chart, and 4) appropriateness of plans with pre- and post-presentation of population data. Finally, we redesigned the population information display based on our findings from this study.</AbstractText>: We used the Veteran's Affairs (VA) database to identify similar complex patients based on a similar complex patient case. Study outcomes measures were 1) preferences for population information display 2) time looking at the population display, 3) time to read the chart, and 4) appropriateness of plans with pre- and post-presentation of population data. Finally, we redesigned the population information display based on our findings from this study.The qualitative data analysis for preferences of population information display resulted in four themes: 1) trusting the big/population data can be an issue, 2) embedded analytics is necessary to explore patient similarities, 3) need for tools to control the view (overview, zoom and filter), and 4) different presentations of the population display can be beneficial to improve the display. We found that appropriateness of plans was at 60% for both groups (t9=-1.9; p=0.08), and overall time looking at the population information display was 2.3 minutes versus 3.6 minutes with experts processing information faster than non-experts (t8= -2.3, p=0.04).</AbstractText>: The qualitative data analysis for preferences of population information display resulted in four themes: 1) trusting the big/population data can be an issue, 2) embedded analytics is necessary to explore patient similarities, 3) need for tools to control the view (overview, zoom and filter), and 4) different presentations of the population display can be beneficial to improve the display. We found that appropriateness of plans was at 60% for both groups (t9=-1.9; p=0.08), and overall time looking at the population information display was 2.3 minutes versus 3.6 minutes with experts processing information faster than non-experts (t8= -2.3, p=0.04).A population database has great potential for reducing complexity and uncertainty in medicine to improve clinical care. The preferences identified for the population information display will guide future health information technology system designers for better and more intuitive display.</AbstractText>: A population database has great potential for reducing complexity and uncertainty in medicine to improve clinical care. The preferences identified for the population information display will guide future health information technology system designers for better and more intuitive display.
#> 38 As real-world data (RWD) in health care begin to cross over to the Big Data realms, a panel of health economists was gathered to establish how well the current US policy environment further the goals of RWD and, if not, what can be done to improve matters. This report summarizes these discussions spanning the current US landscape of RWD availability and usefulness, private versus public development of RWD assets, the current inherent bias in terms of access to RWD, and guiding principles in providing quality assessments of new RWD studies. Three main conclusions emerge: (1) a business case is often required to incentivize investments in RWD assets. However, access restrictions for public data assets have failed to generate a proper market for these data and hence may have led to an underinvestment of public RWDs; (2) Very weak empirical evidence exist on for-profit entities misusing public RWD data entities to further their own agendas, which is the basis for supporting access restrictions of public RWD data; and (3) perhaps developing standardized metrics that could flag misuse of RWDs in an efficient way could help quell some of the fear of sharing public RWD assets with for-profit entities. It is hoped that these discussions and conclusions would pave the way for more rigorous and timely debates on the greater availability and accessibility of RWD assets.
#> 39 Governments use statutes, regulations, and policies, often in innovative ways, to promote health and safety. Organizations outside government, from private schools to major corporations, create rules on matters as diverse as tobacco use and paid sick leave. Very little of this activity is systematically tracked. Even as the rest of the health system is working to build, share, and use a wide range of health and social data, legal information largely remains trapped in text files and pdfs, excluded from the universe of usable data. This article makes the case for the practice of policy surveillance to help end the anomalous treatment of law in public health research and practice. Policy surveillance is the systematic, scientific collection and analysis of laws of public health significance. It meets several important needs. Scientific collection and coding of important laws and policies creates data suitable for use in rigorous evaluation studies. Policy surveillance addresses the chronic lack of readily accessible, nonpartisan information about status and trends in health legislation and policy. It provides the opportunity to build policy capacity in the public health workforce. We trace its emergence over the past fifty years, show its value, and identify major challenges ahead.
#> 40 Healthy people are important for any nation's development. Use of the Internet of Things (IoT)-based body area networks (BANs) is increasing for continuous monitoring and medical healthcare in order to perform real-time actions in case of emergencies. However, in the case of monitoring the health of all citizens or people in a country, the millions of sensors attached to human bodies generate massive volume of heterogeneous data, called "Big Data." Processing Big Data and performing real-time actions in critical situations is a challenging task. Therefore, in order to address such issues, we propose a Real-time Medical Emergency Response System that involves IoT-based medical sensors deployed on the human body. Moreover, the proposed system consists of the data analysis building, called "Intelligent Building," depicted by the proposed layered architecture and implementation model, and it is responsible for analysis and decision-making. The data collected from millions of body-attached sensors is forwarded to Intelligent Building for processing and for performing necessary actions using various units such as collection, Hadoop Processing (HPU), and analysis and decision. The feasibility and efficiency of the proposed system are evaluated by implementing the system on Hadoop using an UBUNTU 14.04 LTS coreTMi5 machine. Various medical sensory datasets and real-time network traffic are considered for evaluating the efficiency of the system. The results show that the proposed system has the capability of efficiently processing WBAN sensory data from millions of users in order to perform real-time responses in case of emergencies.
#> 41 OBJECTIVES: The aim of this manuscript is to provide a brief overview of the scientific challenges that should be addressed in order to unlock the full potential of using data from a general point of view, as well as to present some ideas that could help answer specific needs for data understanding in the field of health sciences and epidemiology.METHODS: A survey of uses and challenges of big data analyses for medicine and public health was conducted. The first part of the paper focuses on big data techniques, algorithms, and statistical approaches to identify patterns in data. The second part describes some cutting-edge applications of analyses and predictive modeling in public health.RESULTS: In recent years, we witnessed a revolution regarding the nature, collection, and availability of data in general. This was especially striking in the health sector and particularly in the field of epidemiology. Data derives from a large variety of sources, e.g. clinical settings, billing claims, care scheduling, drug usage, web based search queries, and Tweets.CONCLUSION: The exploitation of the information (data mining, artificial intelligence) relevant to these data has become one of the most promising as well challenging tasks from societal and scientific viewpoints in order to leverage the information available and making public health more efficient.
#> 42 The use of data analytics across the entire healthcare value chain, from drug discovery and development through epidemiology to informed clinical decision for patients or policy making for public health, has seen an explosion in the recent years. The increase in quantity and variety of data available together with the improvement of storing capabilities and analytical tools offer numerous possibilities to all stakeholders (manufacturers, regulators, payers, healthcare providers, decision makers, researchers) but most importantly, it has the potential to improve general health outcomes if we learn how to exploit it in the right way. This article looks at the different sources of data and the importance of unstructured data. It goes on to summarize current and potential future uses in drug discovery, development, and monitoring as well as in public and personal healthcare; including examples of good practice and recent developments. Finally, we discuss the main practical and ethical challenges to unravel the full potential of big data in healthcare and conclude that all stakeholders need to work together towards the common goal of making sense of the available data for the common good.
#> 43 With the booming of new technologies, biomedical science has transformed into digitalized, data intensive science. Massive amount of data need to be analyzed and interpreted, demand a complete pipeline to train next generation data scientists. To meet this need, the transinstitutional Big Data to Knowledge (BD2K) Initiative has been implemented since 2014, complementing other NIH institutional efforts. In this report, we give an overview the BD2K K01 mentored scientist career awards, which have demonstrated early success. We address the specific trainings needed in representative data science areas, in order to make the next generation of data scientists in biomedicine.
#> 44 Human papillomavirus (HPV) is the most common sexually transmitted infection in the United States. There are several vaccines that protect against strains of HPV most associated with cervical and other cancers. Thus, HPV vaccination has become an important component of adolescent preventive health care. As media evolves, more information about HPV vaccination is shifting to social media platforms such as Twitter. Health information consumed on social media may be especially influential for segments of society such as younger populations, as well as ethnic and racial minorities.</AbstractText>: Human papillomavirus (HPV) is the most common sexually transmitted infection in the United States. There are several vaccines that protect against strains of HPV most associated with cervical and other cancers. Thus, HPV vaccination has become an important component of adolescent preventive health care. As media evolves, more information about HPV vaccination is shifting to social media platforms such as Twitter. Health information consumed on social media may be especially influential for segments of society such as younger populations, as well as ethnic and racial minorities.The objectives of our study were to quantify HPV vaccine communication on Twitter, and to develop a novel methodology to improve the collection and analysis of Twitter data.</AbstractText>: The objectives of our study were to quantify HPV vaccine communication on Twitter, and to develop a novel methodology to improve the collection and analysis of Twitter data.We collected Twitter data using 10 keywords related to HPV vaccination from August 1, 2014 to July 31, 2015. Prospective data collection used the Twitter Search API and retrospective data collection used Twitter Firehose. Using a codebook to characterize tweet sentiment and content, we coded a subsample of tweets by hand to develop classification models to code the entire sample using machine learning procedures. We also documented the words in the 140-character tweet text most associated with each keyword. We used chi-square tests, analysis of variance, and nonparametric equality of medians to test for significant differences in tweet characteristic by sentiment.</AbstractText>: We collected Twitter data using 10 keywords related to HPV vaccination from August 1, 2014 to July 31, 2015. Prospective data collection used the Twitter Search API and retrospective data collection used Twitter Firehose. Using a codebook to characterize tweet sentiment and content, we coded a subsample of tweets by hand to develop classification models to code the entire sample using machine learning procedures. We also documented the words in the 140-character tweet text most associated with each keyword. We used chi-square tests, analysis of variance, and nonparametric equality of medians to test for significant differences in tweet characteristic by sentiment.A total of 193,379 English-language tweets were collected, classified, and analyzed. Associated words varied with each keyword, with more positive and preventive words associated with "HPV vaccine" and more negative words associated with name-brand vaccines. Positive sentiment was the largest type of sentiment in the sample, with 75,393 positive tweets (38.99% of the sample), followed by negative sentiment with 48,940 tweets (25.31% of the sample). Positive and neutral tweets constituted the largest percentage of tweets mentioning prevention or protection (20,425/75,393, 27.09% and 6477/25,110, 25.79%, respectively), compared with only 11.5% of negative tweets (5647/48,940; P<.001). Nearly one-half (22,726/48,940, 46.44%) of negative tweets mentioned side effects, compared with only 17.14% (12,921/75,393) of positive tweets and 15.08% of neutral tweets (3787/25,110; P<.001).</AbstractText>: A total of 193,379 English-language tweets were collected, classified, and analyzed. Associated words varied with each keyword, with more positive and preventive words associated with "HPV vaccine" and more negative words associated with name-brand vaccines. Positive sentiment was the largest type of sentiment in the sample, with 75,393 positive tweets (38.99% of the sample), followed by negative sentiment with 48,940 tweets (25.31% of the sample). Positive and neutral tweets constituted the largest percentage of tweets mentioning prevention or protection (20,425/75,393, 27.09% and 6477/25,110, 25.79%, respectively), compared with only 11.5% of negative tweets (5647/48,940; P<.001). Nearly one-half (22,726/48,940, 46.44%) of negative tweets mentioned side effects, compared with only 17.14% (12,921/75,393) of positive tweets and 15.08% of neutral tweets (3787/25,110; P<.001).Examining social media to detect health trends, as well as to communicate important health information, is a growing area of research in public health. Understanding the content and implications of conversations that form around HPV vaccination on social media can aid health organizations and health-focused Twitter users in creating a meaningful exchange of ideas and in having a significant impact on vaccine uptake. This area of research is inherently interdisciplinary, and this study supports this movement by applying public health, health communication, and data science approaches to extend methodologies across fields.</AbstractText>: Examining social media to detect health trends, as well as to communicate important health information, is a growing area of research in public health. Understanding the content and implications of conversations that form around HPV vaccination on social media can aid health organizations and health-focused Twitter users in creating a meaningful exchange of ideas and in having a significant impact on vaccine uptake. This area of research is inherently interdisciplinary, and this study supports this movement by applying public health, health communication, and data science approaches to extend methodologies across fields.
#> 45 Autoimmune disorders impose a high burden, in terms of morbidity and mortality worldwide. Vasculitis is an autoimmune disorder that causes inflammation and destruction of blood vessels. Harold Allen Ramis, a famous American actor, director, writer, and comedian, died on the February 24, 2014, of complications of an autoimmune inflammatory vasculitis. To investigate the relation between interests and awareness of an autoimmune disease after a relevant event such as the death of a celebrity, we systematically mined Google Trends, Wikitrends, Google News, YouTube, and Twitter, in any language, from their inception until October 31, 2016. Twenty-eight thousand eight hundred fifty-two tweets; 4,133,615 accesses to Wikipedia; 6780 news; and 11,400 YouTube videos were retrieved, processed, and analyzed. The Harold Ramis death of vasculitis resulted into an increase in vasculitis-related Google searches, Wikipedia page accesses, and tweet production, documenting a peak in February 2014. No trend could be detected concerning uploading YouTube videos. The usage of Big Data is promising in the fields of immunology and rheumatology. Clinical practitioners should be aware of this emerging phenomenon.
#> 46 To extend the reach and relevance of epidemiology for public health practice, the science needs be broadened beyond etiologic research, to link more strongly with emerging technologies and to acknowledge key societal transformations. This new focus for epidemiology and its implications for epidemiologic training can be considered in the context of macro trends affecting society, including a greater focus on upstream causes of disease, shifting demographics, the Affordable Care Act and health care system reform, globalization, changing health communication environment, growing centrality of team and transdisciplinary science, emergence of translational sciences, greater focus on accountability, big data, informatics, high-throughput technologies ("omics"), privacy changes, and the evolving funding environment. This commentary describes existing approaches to and competencies for training in epidemiology, maps macro trends with competencies, highlights an example of competency-based education in the Epidemic Intelligence Service of Centers for Disease Control and Prevention, and suggests expanded and more dynamic training approaches. A reexamination of current approaches to epidemiologic training is needed.
#> 47 As a leading cause of severe disability and death, stroke places an enormous burden on Chinese society. A nationwide stroke screening platform called CSDC (China Stoke Data Center) has been built to support the national stroke prevention program and stroke clinical research since 2011. This platform is composed of a data integration system and a big data analysis system. The data integration system is used to collect information on risk factors, diagnosis history, treatment, and sociodemographic characteristics and stroke patients' EMR. The big data analysis system support decision making of stroke control and prevention, clinical evaluation and research. In this paper, the design and implementation of CSDC are illustrated, and some application results are presented. This platform is expected to provide rich data and powerful tool support for stroke control and prevention in China.
#> 48 The advances made in genomics and molecular tools aid public health programs in the investigation of outbreaks and control of diseases by taking advantage of the precision medicine. Precision medicine means "segregating the individuals into subpopulations who vary in their disease susceptibility and response to a precise treatment" and not merely designing of drugs or creation of medical devices. By 2017, the United Kingdom 100,000 Genomes Project is expected to sequence 100,000 genomes from 70,000 patients. Similarly, the Precision Medicine Initiative of the United States plans to increase population-based genome sequencing and link it with clinical data. A national cohort of around 1 million people is to be established in the long term, to investigate the genetic and environmental determinants of health and disease, and further integrated to their electronic health records that are optional. Precision public health can be seen as administering the right intervention to the needy population at an appropriate time. Precision medicine originates from a wet-lab while evidence-based medicine is nurtured in a clinic. Linking the quintessential basic science research and clinical practice is necessary. In addition, new technologies to employ and analyze data in an integrated and dynamic way are essential for public health and precision medicine. The transition from evidence-based approach in public health to genomic approach to individuals with a paradigm shift of a "reactive" medicine to a more "proactive" and personalized health care may sound exceptional. However, a population perspective is needed for the precision medicine to succeed.
#> 49 Pathogen genomics has the potential to transform the clinical and public health management of infectious diseases through improved diagnosis, detection and tracking of antimicrobial resistance and outbreak control. However, the wide-ranging benefits of this technology can only fully be realized through the timely collation, integration and sharing of genomic and clinical/epidemiological metadata by all those involved in the delivery of genomic-informed services. As part of our review on bringing pathogen genomics into 'health-service' practice, we undertook extensive stakeholder consultation to examine the factors integral to achieving effective data sharing and integration. Infrastructure tailored to the needs of clinical users, as well as practical support and policies to facilitate the timely and responsible sharing of data with relevant health authorities and beyond, are all essential. We propose a tiered data sharing and integration model to maximize the immediate and longer term utility of microbial genomics in healthcare. Realizing this model at the scale and sophistication necessary to support national and international infection management services is not uncomplicated. Yet the establishment of a clear data strategy is paramount if failures in containing disease spread due to inadequate knowledge sharing are to be averted, and substantial progress made in tackling the dangers posed by infectious diseases.
#> 50 Big data can be used to assess perceptions about public health issues. This study assessed social media data from Twitter to inform communication campaigns to promote HIV testing and reduce discrimination related to HIV/AIDS or towards key populations to the HIV epidemic, and its potential utility to evaluate such campaigns through HIV testing uptake. Tweets from Brazil were collected from January 2014 to March 2015 and filtered by four categories of keywords including discrimination, HIV prevention, HIV testing, and HIV campaigns. In total over 100,000 geo-located tweets were extracted and analyzed. A dynamic online dashboard updated daily allowed mapping trends, anomalies and influencers, and enabled its use for feedback to campaigns, including correcting misconceptions. These results encourage the use of social networking data for improved messaging in campaigns. Clinical HIV test data was collected monthly from the city of Curitiba and compared to the number of tweets mapped to the city showing a moderate positive correlation (r = 0.39). Results are limited due to the availability of the HIV testing data. The potential of social media as a proxy for HIV testing uptake needs further validation, which can only be done with higher frequency and higher spatial granularity of service delivery data, enabling comparisons with the social media data. Such timely information could empower early response immediate media messaging to support programmatic efforts, such as HIV prevention, testing, and treatment scale up.
#> 51 Schistosomiasis is a parasitic infection that is widespread in sub-Saharan Africa, where it represents a major health problem. We study the drivers of its geographical distribution in Senegal via a spatially explicit network model accounting for epidemiological dynamics driven by local socioeconomic and environmental conditions, and human mobility. The model is parameterized by tapping several available geodatabases and a large dataset of mobile phone traces. It reliably reproduces the observed spatial patterns of regional schistosomiasis prevalence throughout the country, provided that spatial heterogeneity and human mobility are suitably accounted for. Specifically, a fine-grained description of the socioeconomic and environmental heterogeneities involved in local disease transmission is crucial to capturing the spatial variability of disease prevalence, while the inclusion of human mobility significantly improves the explanatory power of the model. Concerning human movement, we find that moderate mobility may reduce disease prevalence, whereas either high or low mobility may result in increased prevalence of infection. The effects of control strategies based on exposure and contamination reduction via improved access to safe water or educational campaigns are also analyzed. To our knowledge, this represents the first application of an integrative schistosomiasis transmission model at a whole-country scale.
#> 52
#> 53
#> 54 This article reflects on the activities of the Ethics Committee of the American College of Epidemiology (ACE). Members of the Ethics Committee identified an opportunity to elaborate on knowledge gained since the inception of the original Ethics Guidelines published by the ACE Ethics and Standards of Practice Committee in 2000.</AbstractText>: This article reflects on the activities of the Ethics Committee of the American College of Epidemiology (ACE). Members of the Ethics Committee identified an opportunity to elaborate on knowledge gained since the inception of the original Ethics Guidelines published by the ACE Ethics and Standards of Practice Committee in 2000.The ACE Ethics Committee presented a symposium session at the 2016 Epidemiology Congress of the Americas in Miami on the evolving complexities of ethics and epidemiology as it pertains to "big data." This article presents a summary and further discussion of that symposium session.</AbstractText>: The ACE Ethics Committee presented a symposium session at the 2016 Epidemiology Congress of the Americas in Miami on the evolving complexities of ethics and epidemiology as it pertains to "big data." This article presents a summary and further discussion of that symposium session.Three topic areas were presented: the policy implications of big data and computing, the fallacy of "secondary" data sources, and the duty of citizens to contribute to big data. A balanced perspective is needed that provides safeguards for individuals but also furthers research to improve population health. Our in-depth review offers next steps for teaching of ethics and epidemiology, as well as for epidemiological research, public health practice, and health policy.</AbstractText>: Three topic areas were presented: the policy implications of big data and computing, the fallacy of "secondary" data sources, and the duty of citizens to contribute to big data. A balanced perspective is needed that provides safeguards for individuals but also furthers research to improve population health. Our in-depth review offers next steps for teaching of ethics and epidemiology, as well as for epidemiological research, public health practice, and health policy.To address contemporary topics in the area of ethics and epidemiology, the Ethics Committee hosted a symposium session on the timely topic of big data. Technological advancements in clinical medicine and genetic epidemiology research coupled with rapid advancements in data networks, storage, and computation at a lower cost are resulting in the growth of huge data repositories. Big data increases concerns about data integrity; informed consent; protection of individual privacy, confidentiality, and harm; data reidentification; and the reporting of faulty inferences.</AbstractText>: To address contemporary topics in the area of ethics and epidemiology, the Ethics Committee hosted a symposium session on the timely topic of big data. Technological advancements in clinical medicine and genetic epidemiology research coupled with rapid advancements in data networks, storage, and computation at a lower cost are resulting in the growth of huge data repositories. Big data increases concerns about data integrity; informed consent; protection of individual privacy, confidentiality, and harm; data reidentification; and the reporting of faulty inferences.
#> 55 As Decision Support Systems start to play a significant role in decision making, especially in the field of public-health policy making, we present an initial attempt to formulate such a system in the concept of public health policy making for hearing loss related problems. Justification for the system's conceptual architecture and its key functionalities are presented. The introduction of the EVOTION DSS sets a key innovation and a basis for paradigm shift in policymaking, by incorporating relevant models, big data analytics and generic demographic data. Expected outcomes for this joint effort are discussed from a public-health point of view.
#> 56 Most deaths occurring due to a surgical intervention happen postoperatively rather than during surgery. The current standard of care in many hospitals cannot fully cope with detecting and addressing post-surgical deterioration in time. For millions of patients, this deterioration is left unnoticed, leading to increased mortality and morbidity. Postoperative deterioration detection currently relies on general scores that are not fully able to cater for the complex post-operative physiology of surgical patients. In the last decade however, advanced risk and warning scoring techniques have started to show encouraging results in terms of using the large amount of data available peri-operatively to improve postoperative deterioration detection. Relevant literature has been carefully surveyed to provide a summary of the most promising approaches as well as how they have been deployed in the perioperative domain. This work also aims to highlight the opportunities that lie in personalizing the models developed for patient deterioration for these particular post-surgical patients and make the output more actionable. The integration of pre- and intra-operative data, e.g. comorbidities, vitals, lab data, and information about the procedure performed, in post-operative early warning algorithms would lead to more contextualized, personalized, and adaptive patient modelling. This, combined with careful integration in the clinical workflow, would result in improved clinical decision support and better post-surgical care outcomes.
#> 57 Spatial big data have the velocity, volume, and variety of big data sources and contain additional geographic information. Digital data sources, such as medical claims, mobile phone call data records, and geographically tagged tweets, have entered infectious diseases epidemiology as novel sources of data to complement traditional infectious disease surveillance. In this work, we provide examples of how spatial big data have been used thus far in epidemiological analyses and describe opportunities for these sources to improve disease-mitigation strategies and public health coordination. In addition, we consider the technical, practical, and ethical challenges with the use of spatial big data in infectious disease surveillance and inference. Finally, we discuss the implications of the rising use of spatial big data in epidemiology to health risk communication, and public health policy recommendations and coordination across scales.
#> 58 The issue of public health in Korea has attracted significant attention given the aging of the country's population, which has created many types of social problems. The approach proposed in this article aims to address dementia, one of the most significant symptoms of aging and a public health care issue in Korea. The Korean National Health Insurance Service Senior Cohort Database contains personal medical data of every citizen in Korea. There are many different medical history patterns between individuals with dementia and normal controls. The approach used in this study involved examination of personal medical history features from personal disease history, sociodemographic data, and personal health examinations to develop a prediction model. The prediction model used a support-vector machine learning technique to perform a 10-fold cross-validation analysis. The experimental results demonstrated promising performance (80.9% F-measure). The proposed approach supported the significant influence of personal medical history features during an optimal observation period. It is anticipated that a biomedical "big data"-based disease prediction model may assist the diagnosis of any disease more correctly.
#> 59 The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods.
#> 60 Bioinformatics is now intrinsic to life science research, but the past decade has witnessed a continuing deficiency in this essential expertise. Basic data stewardship is still taught relatively rarely in life science education programmes, creating a chasm between theory and practice, and fuelling demand for bioinformatics training across all educational levels and career roles. Concerned by this, surveys have been conducted in recent years to monitor bioinformatics and computational training needs worldwide. This article briefly reviews the principal findings of a number of these studies. We see that there is still a strong appetite for short courses to improve expertise and confidence in data analysis and interpretation; strikingly, however, the most urgent appeal is for bioinformatics to be woven into the fabric of life science degree programmes. Satisfying the relentless training needs of current and future generations of life scientists will require a concerted response from stakeholders across the globe, who need to deliver sustainable solutions capable of both transforming education curricula and cultivating a new cadre of trainer scientists.
#> 61
#> 62 Steady changes in society present challenges to constructive cooperation between stakeholders in the diverse PH landscape of Germany through individualism, globalisation, medical progress, digitalisation, etc. Working group 8 therefore suggests that the PH community should build new internal structures, in order to be able to respond jointly to external challenges, facilitate networking amongst the actors and speak with one voice, when needed. The suggestion is to establish an office that has the task to organise further meetings, harmonize written joint statements and moderate the dialogue amongst peers.Die konstruktive Zusammenarbeit der vielfältigen Akteure der deutschen PH-Landschaft wird durch einen stetigen Wandel der gesellschaftlichen Rahmenbedingungen (Individualisierung, Globalisierung, Medizinischer Fortschritt, Digitalisierung) erschwert. Um gemeinsam auf Wandel reagieren, die Vernetzung unter den Akteuren erleichtern, und nach außen mit einer Stimme sprechen zu können, schlägt die AG 8 der PH-Gemeinschaft deshalb die Schaffung eigener interner Strukturen vor. Es wird angeregt, eine Geschäftsstelle zu gründen, die die Organisation weiterer Tagungen und die Abstimmung gemeinsamer schriftlicher Stellungnahmen übernimmt und den Dialog auf Augenhöhe moderiert.
#> 63 To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories.</AbstractText>: To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories.This article describes the tool's design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix.</AbstractText>: This article describes the tool's design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix.DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators.</AbstractText>: DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators.Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup.</AbstractText>: Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup.EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.</AbstractText>: EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.
#> 64
#> 65 Big Data is a diffuse term, which can be described as an approach to linking gigantic and often unstructured data sets. Big Data is used in many corporate areas. For Public Health (PH), however, Big Data is not a well-developed topic. In this article, Big Data is explained according to the intention of use, information efficiency, prediction and clustering. Using the example of application in science, patient care, equal opportunities and smart cities, typical challenges and open questions of Big Data for PH are outlined. In addition to the inevitable use of Big Data, networking is necessary, especially with knowledge-carriers and decision-makers from politics and health care practice.Big Data ist ein diffuser Begriff, der als ein Ansatz zur Verknüpfung riesiger und häufig unstrukturierter Datensätze umschrieben werden könnte. Big Data wird in vielen Gesellschaftsbereichen genutzt. Für Public Health (PH) ist Big Data dagegen ein kaum entwickeltes Thema. In diesem Beitrag wird Big Data über die Anwendungsmotive Informationseffizienz, Vorhersage und Clustering erläutert. Am Beispiel der Anwendung in Wissenschaft, Patientenversorgung, Chancengleichheit und Smart Cities werden typische Herausforderungen und offene Fragen von Big Data für PH umrissen. Neben der unumgänglichen Beschäftigung mit Big Data, ist eine Vernetzung insbesondere mit Wissens- und Entscheidungsträgern aus Politik und Praxis notwendig.
#> 66 Internet-based surveillance methods for vector-borne diseases (VBDs) using "big data" sources such as Google, Twitter, and internet newswire scraping have recently been developed, yet reviews on such "digital disease detection" methods have focused on respiratory pathogens, particularly in high-income regions. Here, we present a narrative review of the literature that has examined the performance of internet-based biosurveillance for diseases caused by vector-borne viruses, parasites, and other pathogens, including Zika, dengue, other arthropod-borne viruses, malaria, leishmaniasis, and Lyme disease across a range of settings, including low- and middle-income countries. The fundamental features, advantages, and drawbacks of each internet big data source are presented for those with varying familiarity of "digital epidemiology." We conclude with some of the challenges and future directions in using internet-based biosurveillance for the surveillance and control of VBD.
#> 67
#> 68 Public health relies on technologies to produce and analyse data, as well as effectively develop and implement policies and practices. An example is the public health practice of epidemiology, which relies on computational technology to monitor the health status of populations, identify disadvantaged or at risk population groups and thereby inform health policy and priority setting. Critical to achieving health improvements for the underserved population of people living with rare diseases is early diagnosis and best care. In the rare diseases field, the vast majority of diseases are caused by destructive but previously difficult to identify protein-coding gene mutations. The reduction in cost of genetic testing and advances in the clinical use of genome sequencing, data science and imaging are converging to provide more precise understandings of the 'person-time-place' triad. That is: who is affected (people); when the disease is occurring (time); and where the disease is occurring (place). Consequently we are witnessing a paradigm shift in public health policy and practice towards 'precision public health'.Patient and stakeholder engagement has informed the need for a national public health policy framework for rare diseases. The engagement approach in different countries has produced highly comparable outcomes and objectives. Knowledge and experience sharing across the international rare diseases networks and partnerships has informed the development of the Western Australian Rare Diseases Strategic Framework 2015-2018 (RD Framework) and Australian government health briefings on the need for a National plan.The RD Framework is guiding the translation of genomic and other technologies into the Western Australian health system, leading to greater precision in diagnostic pathways and care, and is an example of how a precision public health framework can improve health outcomes for the rare diseases population.Five vignettes are used to illustrate how policy decisions provide the scaffolding for translation of new genomics knowledge, and catalyze transformative change in delivery of clinical services. The vignettes presented here are from an Australian perspective and are not intended to be comprehensive, but rather to provide insights into how a new and emerging 'precision public health' paradigm can improve the experiences of patients living with rare diseases, their caregivers and families.The conclusion is that genomic public health is informed by the individual and family needs, and the population health imperatives of an early and accurate diagnosis; which is the portal to best practice care. Knowledge sharing is critical for public health policy development and improving the lives of people living with rare diseases.
#> 69 The exposome is defined as "the totality of environmental exposures encountered from birth to death" and was developed to address the need for comprehensive environmental exposure assessment to better understand disease etiology. Due to the complexity of the exposome, significant efforts have been made to develop technologies for longitudinal, internal and external exposure monitoring, and bioinformatics to integrate and analyze datasets generated. Our objectives were to bring together leaders in the field of exposomics, at a recent Symposium on "Lifetime Exposures and Human Health: The Exposome," held at Yale School of Public Health. Our aim was to highlight the most recent technological advancements for measurement of the exposome, bioinformatics development, current limitations, and future needs in environmental health. In the discussions, an emphasis was placed on moving away from a one-chemical one-health outcome model toward a new paradigm of monitoring the totality of exposures that individuals may experience over their lifetime. This is critical to better understand the underlying biological impact on human health, particularly during windows of susceptibility. Recent advancements in metabolomics and bioinformatics are driving the field forward in biomonitoring and understanding the biological impact, and the technological and logistical challenges involved in the analyses were highlighted. In conclusion, further developments and support are needed for large-scale biomonitoring and management of big data, standardization for exposure and data analyses, bioinformatics tools for co-exposure or mixture analyses, and methods for data sharing.
#> 70 The ability to synthesize and analyze massive amounts of data is critical to the success of organizations, including those that involve global health. As countries become highly interconnected, increasing the risk for pandemics and outbreaks, the demand for big data is likely to increase. This requires a global health workforce that is trained in the effective use of big data. To assess implementation of big data training in global health, we conducted a pilot survey of members of the Consortium of Universities of Global Health. More than half the respondents did not have a big data training program at their institution. Additionally, the majority agreed that big data training programs will improve global health deliverables, among other favorable outcomes. Given the observed gap and benefits, global health educators may consider investing in big data training for students seeking a career in global health.
#> 71 Genomic data, i.e. measurement of variation in the complete genome has revolutionized genetic research and changed our understanding of the pathogenetic mechanisms of diseases. Genomic data in combination with Finnish special strengths - population history, the nation's comprehensive health records and a strong research tradition in genetic epidemiology - has made Finland a testing laboratory for diseases of public health importance. At the same time, genomic research has changed into statistical evaluation of large masses of data - big data. New research knowledge is now descending to the prevention and treatment of diseases, and this will affect future medical practices. In this reform, Finland has a chance to be a key player. The change is, however, global, and the world will not wait that Finland is ready, but instead we have to take care of it ourselves. When successful, new kind of research will help better allocate health care resources, provide more individualized care and stimulate businesses based on new technology.
#> 72 The digital world is generating data at a staggering and still increasing rate. While these "big data" have unlocked novel opportunities to understand public health, they hold still greater potential for research and practice. This review explores several key issues that have arisen around big data. First, we propose a taxonomy of sources of big data to clarify terminology and identify threads common across some subtypes of big data. Next, we consider common public health research and practice uses for big data, including surveillance, hypothesis-generating research, and causal inference, while exploring the role that machine learning may play in each use. We then consider the ethical implications of the big data revolution with particular emphasis on maintaining appropriate care for privacy in a world in which technology is rapidly changing social norms regarding the need for (and even the meaning of) privacy. Finally, we make suggestions regarding structuring teams and training to succeed in working with big data in research and practice.
#> 73
#> 74 Dendritic spine morphology is heterogeneous and highly dynamic. To study the changing or aberrant morphology in test setups, often spines from several neurons from a few experimental units e.g. mice or primary neuronal cultures are measured. This strategy results in a multilevel data structure, which, when not properly addressed, has a high risk of producing false positive and false negative findings.</AbstractText>: Dendritic spine morphology is heterogeneous and highly dynamic. To study the changing or aberrant morphology in test setups, often spines from several neurons from a few experimental units e.g. mice or primary neuronal cultures are measured. This strategy results in a multilevel data structure, which, when not properly addressed, has a high risk of producing false positive and false negative findings.We used mixed-effects models to deal with data with a multilevel data structure and compared this method to analyses at each level. We apply these statistical tests to a dataset of dendritic spine morphology parameters to illustrate advantages of multilevel mixed-effects model, and disadvantages of other models.</AbstractText>: We used mixed-effects models to deal with data with a multilevel data structure and compared this method to analyses at each level. We apply these statistical tests to a dataset of dendritic spine morphology parameters to illustrate advantages of multilevel mixed-effects model, and disadvantages of other models.We present an application of mixed-effects models for analyzing dendritic spine morphology datasets while correcting for the data structure.</AbstractText>: We present an application of mixed-effects models for analyzing dendritic spine morphology datasets while correcting for the data structure.We further show that analyses at spine level and aggregated levels do not adequately account for the data structure, and that they may lead to erroneous results.</AbstractText>: We further show that analyses at spine level and aggregated levels do not adequately account for the data structure, and that they may lead to erroneous results.We highlight the importance of data structure in dendritic spine morphology analyses and highly recommend the use of mixed-effects models or other appropriate statistical methods to deal with multilevel datasets. Mixed-effects models are easy to use and superior to commonly used methods by including the data structure and the addition of other explanatory variables, for example sex, and age, etc., as well as interactions between variables or between variables and level identifiers.</AbstractText>: We highlight the importance of data structure in dendritic spine morphology analyses and highly recommend the use of mixed-effects models or other appropriate statistical methods to deal with multilevel datasets. Mixed-effects models are easy to use and superior to commonly used methods by including the data structure and the addition of other explanatory variables, for example sex, and age, etc., as well as interactions between variables or between variables and level identifiers.
#> 75 The objective of this paper is to identify the extent to which real world data (RWD) is being utilized, or could be utilized, at scale in drug development. Through screening peer-reviewed literature, we have cited specific examples where RWD can be used for biomarker discovery or validation, gaining a new understanding of a disease or disease associations, discovering new markers for patient stratification and targeted therapies, new markers for identifying persons with a disease, and pharmacovigilance. None of the papers meeting our criteria was specifically geared toward novel targets or indications in the biopharmaceutical sector; the majority were focused on the area of public health, often sponsored by universities, insurance providers or in combination with public health bodies such as national insurers. The field is still in an early phase of practical application, and is being harnessed broadly where it serves the most direct need in public health applications in early, rare and novel disease incidents. However, these exemplars provide a valuable contribution to insights on the use of RWD to create novel, faster and less invasive approaches to advance disease understanding and biomarker discovery. We believe that pharma needs to invest in making better use of Electronic Health Records and the need for more precompetitive collaboration to grow the scale of this 'big denominator' capability, especially given the needs of precision medicine research.
#> 76
#> 77
#> 78
#> 79
#> 80
#> 81
#> 82
#> 83 The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.
#> 84 There is a growing interest in using OpenStreetMap [OSM] data in health research. We evaluate the usefulness of OSM data for researching the spatial availability of alcohol, a field which has been hampered by data access difficulties. We find OSM data is about 50% complete, which appears adequate for replicating findings from other studies using alcohol licensing data. Further, we show how OSM quality metrics can be used to select areas with more complete alcohol data. The ease of access and use may create opportunities for analysts and researchers seeking to understand broad patterns of alcohol availability.
#> 85 The holistic management of hearing loss (HL) requires an understanding of factors that predict hearing aid (HA) use and benefit beyond the acoustics of listening environments. Although several predictors have been identified, no study has explored the role of audiological, cognitive, behavioural and physiological data nor has any study collected real-time HA data. This study will collect 'big data', including retrospective HA logging data, prospective clinical data and real-time data via smart HAs, a mobile application and biosensors. The main objective is to enable the validation of the EVOTION platform as a public health policy-making tool for HL.</AbstractText>: The holistic management of hearing loss (HL) requires an understanding of factors that predict hearing aid (HA) use and benefit beyond the acoustics of listening environments. Although several predictors have been identified, no study has explored the role of audiological, cognitive, behavioural and physiological data nor has any study collected real-time HA data. This study will collect 'big data', including retrospective HA logging data, prospective clinical data and real-time data via smart HAs, a mobile application and biosensors. The main objective is to enable the validation of the EVOTION platform as a public health policy-making tool for HL.This will be a big data international multicentre study consisting of retrospective and prospective data collection. Existing data from approximately 35 000 HA users will be extracted from clinical repositories in the UK and Denmark. For the prospective data collection, 1260 HA candidates will be recruited across four clinics in the UK and Greece. Participants will complete a battery of audiological and other assessments (measures of patient-reported HA benefit, mood, cognition, quality of life). Patients will be offered smart HAs and a mobile phone application and a subset will also be given wearable biosensors, to enable the collection of dynamic real-life HA usage data. Big data analytics will be used to detect correlations between contextualised HA usage and effectiveness, and different factors and comorbidities affecting HL, with a view to informing public health decision-making.</AbstractText>: This will be a big data international multicentre study consisting of retrospective and prospective data collection. Existing data from approximately 35 000 HA users will be extracted from clinical repositories in the UK and Denmark. For the prospective data collection, 1260 HA candidates will be recruited across four clinics in the UK and Greece. Participants will complete a battery of audiological and other assessments (measures of patient-reported HA benefit, mood, cognition, quality of life). Patients will be offered smart HAs and a mobile phone application and a subset will also be given wearable biosensors, to enable the collection of dynamic real-life HA usage data. Big data analytics will be used to detect correlations between contextualised HA usage and effectiveness, and different factors and comorbidities affecting HL, with a view to informing public health decision-making.Ethical approval was received from the London South East Research Ethics Committee (17/LO/0789), the Hippokrateion Hospital Ethics Committee (1847) and the Athens Medical Center's Ethics Committee (KM140670). Results will be disseminated through national and international events in Greece and the UK, scientific journals, newsletters, magazines and social media. Target audiences include HA users, clinicians, policy-makers and the general public.</AbstractText>: Ethical approval was received from the London South East Research Ethics Committee (17/LO/0789), the Hippokrateion Hospital Ethics Committee (1847) and the Athens Medical Center's Ethics Committee (KM140670). Results will be disseminated through national and international events in Greece and the UK, scientific journals, newsletters, magazines and social media. Target audiences include HA users, clinicians, policy-makers and the general public.NCT03316287; Pre-results.</AbstractText>: NCT03316287; Pre-results.
#> 86 The growth and diversification of nursing theory, nursing terminology, and nursing data enable a convergence of theory- and data-driven discovery in the era of big data research. Existing datasets can be viewed through theoretical and terminology perspectives using visualization techniques in order to reveal new patterns and generate hypotheses. The Omaha System is a standardized terminology and metamodel that makes explicit the theoretical perspective of the nursing discipline and enables terminology-theory testing research.</AbstractText>: The growth and diversification of nursing theory, nursing terminology, and nursing data enable a convergence of theory- and data-driven discovery in the era of big data research. Existing datasets can be viewed through theoretical and terminology perspectives using visualization techniques in order to reveal new patterns and generate hypotheses. The Omaha System is a standardized terminology and metamodel that makes explicit the theoretical perspective of the nursing discipline and enables terminology-theory testing research.The purpose of this paper is to illustrate the approach by exploring a large research dataset consisting of 95 variables (demographics, temperature measures, anthropometrics, and standardized instruments measuring quality of life and self-efficacy) from a theory-based perspective using the Omaha System. Aims were to (a) examine the Omaha System dataset to understand the sample at baseline relative to Omaha System problem terms and outcome measures, (b) examine relationships within the normalized Omaha System dataset at baseline in predicting adherence, and (c) examine relationships within the normalized Omaha System dataset at baseline in predicting incident venous ulcer.</AbstractText>: The purpose of this paper is to illustrate the approach by exploring a large research dataset consisting of 95 variables (demographics, temperature measures, anthropometrics, and standardized instruments measuring quality of life and self-efficacy) from a theory-based perspective using the Omaha System. Aims were to (a) examine the Omaha System dataset to understand the sample at baseline relative to Omaha System problem terms and outcome measures, (b) examine relationships within the normalized Omaha System dataset at baseline in predicting adherence, and (c) examine relationships within the normalized Omaha System dataset at baseline in predicting incident venous ulcer.Variables from a randomized clinical trial of a cryotherapy intervention for the prevention of venous ulcers were mapped onto Omaha System terms and measures to derive a theoretical framework for the terminology-theory testing study. The original dataset was recoded using the mapping to create an Omaha System dataset, which was then examined using visualization to generate hypotheses. The hypotheses were tested using standard inferential statistics. Logistic regression was used to predict adherence and incident venous ulcer.</AbstractText>: Variables from a randomized clinical trial of a cryotherapy intervention for the prevention of venous ulcers were mapped onto Omaha System terms and measures to derive a theoretical framework for the terminology-theory testing study. The original dataset was recoded using the mapping to create an Omaha System dataset, which was then examined using visualization to generate hypotheses. The hypotheses were tested using standard inferential statistics. Logistic regression was used to predict adherence and incident venous ulcer.Findings revealed novel patterns in the psychosocial characteristics of the sample that were discovered to be drivers of both adherence (Mental health Behavior: OR = 1.28, 95% CI [1.02, 1.60]; AUC = .56) and incident venous ulcer (Mental health Behavior: OR = 0.65, 95% CI [0.45, 0.93]; Neuro-musculo-skeletal function Status: OR = 0.69, 95% CI [0.47, 1.00]; male: OR = 3.08, 95% CI [1.15, 8.24]; not married: OR = 2.70, 95% CI [1.00, 7.26]; AUC = .76).</AbstractText>: Findings revealed novel patterns in the psychosocial characteristics of the sample that were discovered to be drivers of both adherence (Mental health Behavior: OR = 1.28, 95% CI [1.02, 1.60]; AUC = .56) and incident venous ulcer (Mental health Behavior: OR = 0.65, 95% CI [0.45, 0.93]; Neuro-musculo-skeletal function Status: OR = 0.69, 95% CI [0.47, 1.00]; male: OR = 3.08, 95% CI [1.15, 8.24]; not married: OR = 2.70, 95% CI [1.00, 7.26]; AUC = .76).The Omaha System was employed as ontology, nursing theory, and terminology to bridge data and theory and may be considered a data-driven theorizing methodology. Novel findings suggest a relationship between psychosocial factors and incident venous ulcer outcomes. There is potential to employ this method in further research, which is needed to generate and test hypotheses from other datasets to extend scientific investigations from existing data.</AbstractText>: The Omaha System was employed as ontology, nursing theory, and terminology to bridge data and theory and may be considered a data-driven theorizing methodology. Novel findings suggest a relationship between psychosocial factors and incident venous ulcer outcomes. There is potential to employ this method in further research, which is needed to generate and test hypotheses from other datasets to extend scientific investigations from existing data.
#> 87
#> 88 This letter provides an overview of the application of big data in health care system to improve quality of care, including predictive modelling for risk and resource use, precision medicine and clinical decision support, quality of care and performance measurement, public health and research applications, among others. The author delineates the tremendous potential for big data analytics and discuss how it can be successfully implemented in clinical practice, as an important component of a learning health-care system.
#> 89
#> 90 Heterogeneity of human beings leads to think and react differently to social phenomena. Awareness and homophily drive people to weigh interactions in social multiplex networks, influencing a potential contagion effect. To quantify the impact of heterogeneity on spreading dynamics, we propose a model of coevolution of social contagion and awareness, through the introduction of statistical estimators, in a weighted multiplex network. Multiplexity of networked individuals may trigger propagation enough to produce effects among vulnerable subjects experiencing distress, mental disorder, which represent some of the strongest predictors of suicidal behaviours. The exposure to suicide is emotionally harmful, since talking about it may give support or inadvertently promote it. To disclose the complex effect of the overlapping awareness on suicidal ideation spreading among disordered people, we also introduce a data-driven approach by integrating different types of data. Our modelling approach unveils the relationship between distress and mental disorders propagation and suicidal ideation spreading, shedding light on the role of awareness in a social network for suicide prevention. The proposed model is able to quantify the impact of overlapping awareness on suicidal ideation spreading and our findings demonstrate that it plays a dual role on contagion, either reinforcing or delaying the contagion outbreak.
#> 91
#> 92 This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL).</AbstractText>: This paper reports on a generic framework to provide clinicians with the ability to conduct complex analyses on elaborate research topics using cascaded queries to resolve internal time-event dependencies in the research questions, as an extension to the proposed Clinical Data Analytics Language (CliniDAL).A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution.</AbstractText>: A cascaded query model is proposed to resolve internal time-event dependencies in the queries which can have up to five levels of criteria starting with a query to define subjects to be admitted into a study, followed by a query to define the time span of the experiment. Three more cascaded queries can be required to define control groups, control variables and output variables which all together simulate a real scientific experiment. According to the complexity of the research questions, the cascaded query model has the flexibility of merging some lower level queries for simple research questions or adding a nested query to each level to compose more complex queries. Three different scenarios (one of them contains two studies) are described and used for evaluation of the proposed solution.CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days.</AbstractText>: CliniDAL's complex analyses solution enables answering complex queries with time-event dependencies at most in a few hours which manually would take many days.An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution.</AbstractText>: An evaluation of results of the research studies based on the comparison between CliniDAL and SQL solutions reveals high usability and efficiency of CliniDAL's solution.
#> 93 Precision medicine is making an impact on patients, health care delivery systems, and research participants in ways that were only imagined fifteen years ago when the human genome was first sequenced. Discovery of disease-causing and drug-response genetic variants has accelerated, while adoption into clinical medicine has lagged. We define precision medicine and the stakeholder community required to enable its integration into research and health care. We explore the intersection of data science, analytics, and precision medicine in the formation of health systems that carry out research in the context of clinical care and that optimize the tools and information used to deliver improved patient outcomes. We provide examples of real-world impact and conclude with a policy and economic agenda necessary for the adoption of this new paradigm of health care both in the United States and globally.
#> 94 PURPOSE: We aimed to analyze the incidence, time to occurrence, and congestive heart failure (CHF) risk factors for early breast cancer patients treated with anthracycline (AC)-based chemotherapy and/or trastuzumab (T) therapy in Korea.METHODS: We included female patients > 19 years old from the Health Insurance Review and Assessment Service database who had no prior CHF history and had been diagnosed with early breast cancer between January 2007 and October 2016.RESULTS: We included 83,544 patients in our analysis. In terms of crude incidence for CHF, AC followed by T showed the highest incidence (6.3%). However, 3.1 and 4.2% of the patients had CHF due to AC-based chemotherapy and non-AC followed by T, respectively. The median times to occurrence of CHF were different according to adjuvant treatments, approximately 2 years (701.0 days) in the AC-based chemotherapy group vs 1 year (377.5 days) AC followed by T group. T therapy was associated with earlier development of CHF irrespective of previous chemotherapy, but late risk of CHF 1.2 years after T therapy rapidly decreased in both chemotherapy groups. Multivariate Cox regression analysis revealed that the adjusted hazard ratio for CHF was increased in the group of older patients (≥ 65 years old) who underwent AC followed by T therapy, with Charlson comorbidity index scores of ≥ 2.CONCLUSIONS: Our study showed that neo-/adjuvant chemotherapy using T irrespective of previous chemotherapy (AC or non-AC) was associated with significantly increased risk of CHF compared with AC-based chemotherapy in Korean patients with early breast cancer.
#> 95 While efficacy and safety data collected from randomized clinical trials are the evidentiary standard for determining market authorization, this alone may no longer be sufficient to address the needs of key stakeholders (regulators, providers, and payers) and guarantee long-term success of pharmaceutical products. There is a heightened interest from stakeholders on understanding the use of real-world evidence (RWE) to substantiate benefit-risk assessment and support the value of a new drug. This review provides an overview of real-world data (RWD) and related advances in the regulatory framework, and discusses their impact on clinical research and development. A framework for linking drug development decisions with the value proposition of the drug, utilizing pharmacokinetic-pharmacodynamic-pharmacoeconomic models, is introduced. The summary presented here is based on the presentations and discussion at the symposium entitled Innovation at the Intersection of Clinical Trials and Real-World Data to Advance Patient Care at the American Society for Clinical Pharmacology and Therapeutics (ASCPT) 2017 Annual Meeting.
#> 96 It seems no longer possible to produce knowledge, even biological knowledge regardless of social, cultural and economic environments in which they were observed. Therefore never the term "social medicine" or more generally "social biology" has appeared more appropriate. This way of linking the social and the biological exceeds the sole social medicine by involving also other medical disciplines. As such, forensics, whose an important activity is represented by clinical forensics in charge of types of violence (physical, psychological, sexual, abuse) and persons held in custody could see its practice heavily modified through the use of various data describing both the clinical situation of patients but also their context of life. A better understanding of mechanisms of violence development and potentially a better prevention of these situations allow forensics not to be restricted (or seen as limited to) a "descriptive medicine", but to be seen also as a preventive and curative medicine. In this evolution, the potential contribution of Big Data appears significant insofar as information on a wide range of characteristics of the environment or context of life (social, economic, cultural) can be collected and be connected with health data, for example to develop models on social determinants of health. In the common thinking, the use of a larger amount of data and consequently a multiplicity of information via a multiplicity of databases would allow to access to a greater objectivity of a reality that we are approaching by fragmented viewpoints otherwise. In this light, the "bigger" and "more varied" would serve the "better" or at least the "more true". But to be able to consider together or to link different databases it will be necessary to know how to handle this diversity regarding hypotheses made to build databases and regarding their purposes (by whom, for what bases have been made). It will be equally important to question the representativeness of situations that led to the creation of a database and to question the validity of information and data according to the secondary or tertiary uses anticipated from their original purpose. This step of data validity control for the anticipated use is a sine qua non condition, particularly in the field of public health, to guarantee a sufficient level of quality and exploit in the best way the benefits of Big Data approaches.
#> 97 This article examines how digital epidemiology and eHealth coalesce into a powerful health surveillance system that fundamentally changes present notions of body and health. In the age of Big Data and Quantified Self, the conceptual and practical distinctions between individual and population body, personal and public health, surveillance and health care are diminishing. Expanding on Armstrong's concept of "surveillance medicine" to "quantified self medicine" and drawing on my own research on the symbolic power of statistical constructs in medical encounters, this article explores the impact of digital health surveillance on people's perceptions, actions and subjectivities. It discusses the epistemic confusions and paradoxes produced by a health care system that increasingly treats patients as risk profiles and prompts them to do the same, namely to perceive and manage themselves as a bundle of health and security risks. Since these risks are necessarily constructed in reference to epidemiological data that postulate a statistical gaze, they also construct or make-up disembodied "individuals on alert".
#> 98
#> 99 Abdominal obesity has become an important public health issue in China. Socioeconomic disparities are thought to be closely related to the prevalence of abdominal obesity. Exploring socioeconomic disparities in abdominal obesity over the life course in China could inform the design of new interventions to prevent and control abdominal obesity.</AbstractText>: Abdominal obesity has become an important public health issue in China. Socioeconomic disparities are thought to be closely related to the prevalence of abdominal obesity. Exploring socioeconomic disparities in abdominal obesity over the life course in China could inform the design of new interventions to prevent and control abdominal obesity.The China Health and Nutrition Survey (CHNS) was a prospective household-based study involving seven rounds of surveys between 1993 and 2011. Twenty three thousand, two hundred and forty-three individuals were followed up over an 18-year period. The mixed effects models with random intercepts were used to assess the effects on abdominal obesity. Six key socioeconomic indicators, with age and age-squared added to the models, were used to identify socioeconomic disparities in abdominal obesity over the adult life course.</AbstractText>: The China Health and Nutrition Survey (CHNS) was a prospective household-based study involving seven rounds of surveys between 1993 and 2011. Twenty three thousand, two hundred and forty-three individuals were followed up over an 18-year period. The mixed effects models with random intercepts were used to assess the effects on abdominal obesity. Six key socioeconomic indicators, with age and age-squared added to the models, were used to identify socioeconomic disparities in abdominal obesity over the adult life course.Prevalence of abdominal obesity increased non-linearly with age over the adult life course. Abdominal obesity was more prevalent in younger than older birth cohorts. Positive period effects on the prevalence of abdominal obesity were substantial from 1993 to 2011, and were stronger among males than females. Prevalence of abdominal obesity was higher among ethnic Han Chinese and among the married [coefficient (95% confidence intervals): 0.03(0.003, 0.057) and 0.035(0.022, 0.047), respectively], and was lower among males [coefficient (95% confidence intervals): - 0.065(- 0.075,-0.055)]. A higher-level of urbanization and higher household income increased the probability of abdominal obesity [coefficient (95% confidence intervals): 0.160(0.130, 0.191), 3.47E<sup>- 4</sup> (2.23E<sup>- 4</sup>, 4.70E<sup>- 4</sup>), respectively], while individuals with more education were less likely to experience abdominal obesity [coefficient (95% confidence intervals): - 0.222 (- 0.289, - 0.155)] across adulthood.</AbstractText>: ), respectively], while individuals with more education were less likely to experience abdominal obesity [coefficient (95% confidence intervals): - 0.222 (- 0.289, - 0.155)] across adulthood.In China, abdominal obesity increased substantially in more recent cohorts. And people with lower educational attainment, with higher household income, or living in more urbanized communities may be the disadvantaged population of abdominal obesity over the adult life course. Effective interventions targeting the vulnerable population need to be developed.</AbstractText>: In China, abdominal obesity increased substantially in more recent cohorts. And people with lower educational attainment, with higher household income, or living in more urbanized communities may be the disadvantaged population of abdominal obesity over the adult life course. Effective interventions targeting the vulnerable population need to be developed.
#> 100 Depression is a complex disorder with large interindividual variability in symptom profiles that often occur alongside symptoms of other psychiatric domains, such as anxiety. A dimensional and symptom-based approach may help refine the characterization of depressive and anxiety disorders and thus aid in establishing robust biomarkers. We use resting-state functional magnetic resonance imaging to assess the brain functional connectivity correlates of a symptom-based clustering of individuals.</AbstractText>: Depression is a complex disorder with large interindividual variability in symptom profiles that often occur alongside symptoms of other psychiatric domains, such as anxiety. A dimensional and symptom-based approach may help refine the characterization of depressive and anxiety disorders and thus aid in establishing robust biomarkers. We use resting-state functional magnetic resonance imaging to assess the brain functional connectivity correlates of a symptom-based clustering of individuals.We assessed symptoms using the Beck Depression and Beck Anxiety Inventories in individuals with or without a history of depression (N = 1084) and high-dimensional data clustering to form subgroups based on symptom profiles. We compared dynamic and static functional connectivity between subgroups in a subset of the total sample (n = 252).</AbstractText>: We assessed symptoms using the Beck Depression and Beck Anxiety Inventories in individuals with or without a history of depression (N = 1084) and high-dimensional data clustering to form subgroups based on symptom profiles. We compared dynamic and static functional connectivity between subgroups in a subset of the total sample (n = 252).We identified five subgroups with distinct symptom profiles, which cut across diagnostic boundaries with different total severity, symptom patterns, and centrality. For instance, inability to relax, fear of the worst, and feelings of guilt were among the most severe symptoms in subgroups 1, 2, and 3, respectively. The distribution of individuals was 32%, 25%, 22%, 10%, and 11% in subgroups 1 to 5, respectively. These subgroups showed evidence of differential static brain-connectivity patterns, in particular comprising a frontotemporal network. In contrast, we found no significant associations with clinical sum scores, dynamic functional connectivity, or global connectivity.</AbstractText>: We identified five subgroups with distinct symptom profiles, which cut across diagnostic boundaries with different total severity, symptom patterns, and centrality. For instance, inability to relax, fear of the worst, and feelings of guilt were among the most severe symptoms in subgroups 1, 2, and 3, respectively. The distribution of individuals was 32%, 25%, 22%, 10%, and 11% in subgroups 1 to 5, respectively. These subgroups showed evidence of differential static brain-connectivity patterns, in particular comprising a frontotemporal network. In contrast, we found no significant associations with clinical sum scores, dynamic functional connectivity, or global connectivity.Adding to the pursuit of individual-based treatment, subtyping based on a dimensional conceptualization and unique constellations of anxiety and depression symptoms is supported by distinct patterns of static functional connectivity in the brain.</AbstractText>: Adding to the pursuit of individual-based treatment, subtyping based on a dimensional conceptualization and unique constellations of anxiety and depression symptoms is supported by distinct patterns of static functional connectivity in the brain.
#> 101
#> 102
#> 103 For a variety of reasons including cheap computing, widespread adoption of electronic medical records, digitalization of imaging and biosignals, and rapid development of novel technologies, the amount of health care data being collected, recorded, and stored is increasing at an exponential rate. Yet despite these advances, methods for the valid, efficient, and ethical utilization of these data remain underdeveloped. Emergency care research, in particular, poses several unique challenges in this rapidly evolving field. A group of content experts was recently convened to identify research priorities related to barriers to the application of data science to emergency care research. These recommendations included: 1) developing methods for cross-platform identification and linkage of patients; 2) creating central, deidentified, open-access databases; 3) improving methodologies for visualization and analysis of intensively sampled data; 4) developing methods to identify and standardize electronic medical record data quality; 5) improving and utilizing natural language processing; 6) developing and utilizing syndrome or complaint-based based taxonomies of disease; 7) developing practical and ethical framework to leverage electronic systems for controlled trials; 8) exploring technologies to help enable clinical trials in the emergency setting; and 9) training emergency care clinicians in data science and data scientists in emergency care medicine. The background, rationale, and conclusions of these recommendations are included in the present article.
#> 104
#> 105 Recent advances in data science were used capitalize on the extensive quantity of data available in electronic health records to predict patient aggressive events. This retrospective study utilized electronic health records (N = 29,841) collected between January 2010 and December 2015 at Harris County Psychiatric Center, a 274-bed safety net community psychiatric facility. The primary outcome of interest was the presence (1.4%) versus absence (98.6%) of an aggressive event toward staff or patients. The best-performing algorithm, penalized generalized linear modeling, achieved an area under the curve = 0.7801. The strongest predictors of patient aggressive events included homelessness (b = 0.52), having been convicted of assault (b = 0.31), and having witnessed abuse (b = -0.28). The algorithm was also used to generate a cost-optimized probability threshold (6%) for an aggressive event, theoretically affording individualized hospital-staff coverage on the 2.8% of inpatients at highest risk for aggression, based on available hospital operating costs. The present research demonstrated the utility of a data science approach to better understand a high-priority event in psychiatric inpatient settings.
#> 106
#> 107 The paradigm shift to a knowledge-based economy has incremented the use of personal information applied to health-related activities, such as biomedical research, innovation, and commercial initiatives. The convergence of science, technology, communication and data technologies has given rise to the application of big data to health; for example through eHealth, human databases and biobanks.</AbstractText>: The paradigm shift to a knowledge-based economy has incremented the use of personal information applied to health-related activities, such as biomedical research, innovation, and commercial initiatives. The convergence of science, technology, communication and data technologies has given rise to the application of big data to health; for example through eHealth, human databases and biobanks.In light of these changes, we enquire about the value of personal data and its appropriate use. In order to illustrate the complex ground on which big data applied to health develops, we analyse the current situation of the European Union and two cases: the Catalan VISC+/PADRIS and the UK Biobank, as perspectives.</AbstractText>: In light of these changes, we enquire about the value of personal data and its appropriate use. In order to illustrate the complex ground on which big data applied to health develops, we analyse the current situation of the European Union and two cases: the Catalan VISC+/PADRIS and the UK Biobank, as perspectives.Personal health-related data in the context of the European Union is being increasingly used for big data projects under diverse schemes. There, public and private sectors participate distinctively or jointly, pursuing very different goals which may conflict with individual rights, notably privacy. Given that, this paper advocates for stopping the unjustified accumulation and commercialisation of personal data, protecting the interests of citizens and building appropriate frameworks to govern big data projects for health. A core tool for achieving such goals is to develop consent mechanisms which allow truly informed but adaptable consent, conjugated with the engagement of donors, participants and society.</AbstractText>: Personal health-related data in the context of the European Union is being increasingly used for big data projects under diverse schemes. There, public and private sectors participate distinctively or jointly, pursuing very different goals which may conflict with individual rights, notably privacy. Given that, this paper advocates for stopping the unjustified accumulation and commercialisation of personal data, protecting the interests of citizens and building appropriate frameworks to govern big data projects for health. A core tool for achieving such goals is to develop consent mechanisms which allow truly informed but adaptable consent, conjugated with the engagement of donors, participants and society.
#> 108
#> 109 Inhospital pediatric trauma care typically spans multiple locations, which influences the use of resources, that could be improved by gaining a better understanding of the inhospital flow of patients and identifying opportunities for improvement.</AbstractText>: Inhospital pediatric trauma care typically spans multiple locations, which influences the use of resources, that could be improved by gaining a better understanding of the inhospital flow of patients and identifying opportunities for improvement.To describe a process mining approach for mapping the inhospital flow of pediatric trauma patients, to identify and characterize the major patient pathways and care transitions, and to identify opportunities for patient flow and triage improvement.</AbstractText>: To describe a process mining approach for mapping the inhospital flow of pediatric trauma patients, to identify and characterize the major patient pathways and care transitions, and to identify opportunities for patient flow and triage improvement.The process map for the cohort was similar to a validated process map derived through qualitative methods. The process map for Bravo encounters had a relatively low fitness of 0.887, and 96 (5.6%) encounters were identified as nonconforming with characteristics comparable to Alpha encounters. In total, 28 patient pathways and 20 care transitions were identified. The top five patient pathways were traversed by 92.1% of patients, whereas the top five care transitions accounted for 87.5% of all care transitions. A larger-than-expected number of discharges from the pediatric intensive care unit (PICU) were identified, with 84.2% involving discharge to home without the need for home care services.</AbstractText>: The process map for the cohort was similar to a validated process map derived through qualitative methods. The process map for Bravo encounters had a relatively low fitness of 0.887, and 96 (5.6%) encounters were identified as nonconforming with characteristics comparable to Alpha encounters. In total, 28 patient pathways and 20 care transitions were identified. The top five patient pathways were traversed by 92.1% of patients, whereas the top five care transitions accounted for 87.5% of all care transitions. A larger-than-expected number of discharges from the pediatric intensive care unit (PICU) were identified, with 84.2% involving discharge to home without the need for home care services.Process mining was successfully applied to derive process maps from trauma registry data and to identify opportunities for trauma triage improvement and optimization of PICU use.</AbstractText>: Process mining was successfully applied to derive process maps from trauma registry data and to identify opportunities for trauma triage improvement and optimization of PICU use.
#> 110 Parts I through III of this paper will examine several, increasingly comprehensive forms of aggregation, ranging from insurance reimbursement "lock-in" programs to PDMPs to completely unified electronic medical records (EMRs). Each part will advocate for the adoption of these aggregation systems and provide suggestions for effective implementation in the fight against opioid misuse. All PDMPs are not made equal, however, and Part II will, therefore, focus on several elements - mandating prescriber usage, streamlining the user interface, ensuring timely data uploads, creating a national data repository, mitigating privacy concerns, and training doctors on how to respond to perceived doctor-shopping - that can make these systems more effective. In each part, we will also discuss the privacy concerns of aggregating data, ranging from minimal to significant, and highlight the unique role of stigma in motivating these concerns. In Part IV, we will conclude by suggesting remedial steps to offset this loss of privacy and to combat the stigma around SUDs and mental health disorders in general.
#> 111 OBJECTIVE: To summarize the recent public and population health informatics literature with a focus on the synergistic "bridging" of electronic data to benefit communities and other populations.METHODS: The review was primarily driven by a search of the literature from July 1, 2016 to September 30, 2017. The search included articles indexed in PubMed using subject headings with (MeSH) keywords "public health informatics" and "social determinants of health". The "social determinants of health" search was refined to include articles that contained the keywords "public health", "population health" or "surveillance".RESULTS: Several categories were observed in the review focusing on public health's socio-technical infrastructure: evaluation of surveillance practices, surveillance methods, interoperable health information infrastructure, mobile health, social media, and population health. Common trends discussing socio-technical infrastructure included big data platforms, social determinants of health, geographical information systems, novel data sources, and new visualization techniques. A common thread connected these categories of workforce, governance, and sustainability: using clinical resources and data to bridge public and population health.CONCLUSIONS: Both medical care providers and public health agencies are increasingly using informatics and big data tools to create and share digital information. The intent of this "bridging" is to proactively identify, monitor, and improve a range of medical, environmental, and social factors relevant to the health of communities. These efforts show a significant growth in a range of population health-centric information exchange and analytics activities.
#> 112 Every year about three million Muslims visit the Holy City of Makkah in Saudi Arabia to perform the Hajj. Because of the large number of people present during this period, pilgrims can be subjected to many health hazards. An adequate system to minimize these health hazards is needed to support the pilgrims who attend the Hajj. This study justifies the need for developing a large data-based m-Health application to identify the health hazards encountered during the Hajj.</AbstractText>: Every year about three million Muslims visit the Holy City of Makkah in Saudi Arabia to perform the Hajj. Because of the large number of people present during this period, pilgrims can be subjected to many health hazards. An adequate system to minimize these health hazards is needed to support the pilgrims who attend the Hajj. This study justifies the need for developing a large data-based m-Health application to identify the health hazards encountered during the Hajj.In developing a big data-based m-Health application, this study follows the framework suggested by Hevner. The design of the science framework allows the development of a technological solution (i.e., design artifact) of the problem through a series of actions. The design involves rigorous knowledge of the environmental factors, including knowledge of the construction and evaluation of technological solutions, that are important and relevant to an existing problem.</AbstractText>: In developing a big data-based m-Health application, this study follows the framework suggested by Hevner. The design of the science framework allows the development of a technological solution (i.e., design artifact) of the problem through a series of actions. The design involves rigorous knowledge of the environmental factors, including knowledge of the construction and evaluation of technological solutions, that are important and relevant to an existing problem.Based on the design science framework, the process of artifact development can be classified into Artifact Design, Artifact Implementation, and Artifact Evaluation. This paper presents the Artifact Design step for the design of the big data-based m-Health application, which has an Environmental Relevance Cycle, a Knowledge-based rigor Cycle, and an Artifice development and design cycle. The big data-based m-Health application is a prototype and must be evaluated using the evaluation-and-feedback loop process until the optimum artifact is completely built and integrated into the system.</AbstractText>: Based on the design science framework, the process of artifact development can be classified into Artifact Design, Artifact Implementation, and Artifact Evaluation. This paper presents the Artifact Design step for the design of the big data-based m-Health application, which has an Environmental Relevance Cycle, a Knowledge-based rigor Cycle, and an Artifice development and design cycle. The big data-based m-Health application is a prototype and must be evaluated using the evaluation-and-feedback loop process until the optimum artifact is completely built and integrated into the system.Development of a big data-based m-Health application using a design science framework can support the effective and comprehensive plan of the government of Saudi Arabia for preventing and managing Hajj-related health issues. Our proposed model for developing and designing a big data-based m-Health application could provide direction for developing the most advanced solution for dealing with the Hajj-related health issues in the future.</AbstractText>: Development of a big data-based m-Health application using a design science framework can support the effective and comprehensive plan of the government of Saudi Arabia for preventing and managing Hajj-related health issues. Our proposed model for developing and designing a big data-based m-Health application could provide direction for developing the most advanced solution for dealing with the Hajj-related health issues in the future.
#> 113
#> 114 In November, 2014, a cluster of HIV infections was detected among people who inject drugs in Scott County, IN, USA, with 215 HIV infections eventually attributed to the outbreak. This study examines whether earlier implementation of a public health response could have reduced the scale of the outbreak.</AbstractText>: In November, 2014, a cluster of HIV infections was detected among people who inject drugs in Scott County, IN, USA, with 215 HIV infections eventually attributed to the outbreak. This study examines whether earlier implementation of a public health response could have reduced the scale of the outbreak.In this modelling study, we derived weekly case data from the HIV outbreak in Scott County, IN, and on the uptake of HIV testing, treatment, and prevention services from publicly available reports from the US Centers for Disease Control and Prevention (CDC) and researchers from Indiana. Our primary objective was to determine if an earlier response to the outbreak could have had an effect on the number of people infected. We computed upper and lower bounds for cumulative HIV incidence by digitally extracting data from published images from a CDC study using Bio-Rad avidity incidence testing to estimate the recency of each transmission event. We constructed a generalisation of the susceptible-infectious-removed model to capture the transmission dynamics of the HIV outbreak. We computed non-parametric interval estimates of the number of individuals with an undiagnosed HIV infection, the case-finding rate per undiagnosed HIV infection, and model-based bounds for the HIV transmission rate throughout the epidemic. We used these models to assess the potential effect if the same intervention had begun at two key timepoints earlier than the actual date of the initiation of efforts to control the outbreak.</AbstractText>: In this modelling study, we derived weekly case data from the HIV outbreak in Scott County, IN, and on the uptake of HIV testing, treatment, and prevention services from publicly available reports from the US Centers for Disease Control and Prevention (CDC) and researchers from Indiana. Our primary objective was to determine if an earlier response to the outbreak could have had an effect on the number of people infected. We computed upper and lower bounds for cumulative HIV incidence by digitally extracting data from published images from a CDC study using Bio-Rad avidity incidence testing to estimate the recency of each transmission event. We constructed a generalisation of the susceptible-infectious-removed model to capture the transmission dynamics of the HIV outbreak. We computed non-parametric interval estimates of the number of individuals with an undiagnosed HIV infection, the case-finding rate per undiagnosed HIV infection, and model-based bounds for the HIV transmission rate throughout the epidemic. We used these models to assess the potential effect if the same intervention had begun at two key timepoints earlier than the actual date of the initiation of efforts to control the outbreak.The upper bound for undiagnosed HIV infections in Scott County peaked at 126 around Jan 10, 2015, over 2 months before the Governor of Indiana declared a public health emergency on March 26, 2015. Applying the observed case-finding rate scale-up to earlier intervention times suggests that an earlier public health response could have substantially reduced the total number of HIV infections (estimated to have been 183-184 infections by Aug 11, 2015). Initiation of a response on Jan 1, 2013, could have suppressed the number of infections to 56 or fewer, averting at least 127 infections; whereas an intervention on April 1, 2011, could have reduced the number of infections to ten or fewer, averting at least 173 infections.</AbstractText>: The upper bound for undiagnosed HIV infections in Scott County peaked at 126 around Jan 10, 2015, over 2 months before the Governor of Indiana declared a public health emergency on March 26, 2015. Applying the observed case-finding rate scale-up to earlier intervention times suggests that an earlier public health response could have substantially reduced the total number of HIV infections (estimated to have been 183-184 infections by Aug 11, 2015). Initiation of a response on Jan 1, 2013, could have suppressed the number of infections to 56 or fewer, averting at least 127 infections; whereas an intervention on April 1, 2011, could have reduced the number of infections to ten or fewer, averting at least 173 infections.Early and robust surveillance efforts and case finding alone could reduce nascent epidemics. Ensuring access to HIV services and harm-reduction interventions could further reduce the likelihood of outbreaks, and substantially mitigate their severity and scope.</AbstractText>: Early and robust surveillance efforts and case finding alone could reduce nascent epidemics. Ensuring access to HIV services and harm-reduction interventions could further reduce the likelihood of outbreaks, and substantially mitigate their severity and scope.US National Institute on Drug Abuse, US National Institutes of Mental Health, US National Institutes of Health Big Data to Knowledge programme, and the US National Institutes of Health.</AbstractText>: US National Institute on Drug Abuse, US National Institutes of Mental Health, US National Institutes of Health Big Data to Knowledge programme, and the US National Institutes of Health.
#> 115
#> 116 Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships between sequence, structure, function and disease, all of which play out on the stage of evolution, and data science enables us to elucidate these relationships at scale. Here, we consider the above question from the five key pillars of data science: acquisition, engineering, analytics, visualization and policy, with an emphasis on machine learning as the premier analytics approach.
#> 117 Ready data availability, cheap storage capacity, and powerful tools for extracting information from data have the potential to significantly enhance the human condition. However, as with all advanced technologies, this comes with the potential for misuse. Ethical oversight and constraints are needed to ensure that an appropriate balance is reached. Ethical issues involving data may be more challenging than the ethical challenges of some other advanced technologies partly because data and data science are ubiquitous, having the potential to impact all aspects of life, and partly because of their intrinsic complexity. We explore the nature of data, personal data, data ownership, consent and purpose of use, trustworthiness of data as well as of algorithms and of those using the data, and matters of privacy and confidentiality. A checklist is given of topics that need to be considered.
#> 118 We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels. Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data. Our treatment is comprehensive and ranges from qualitative to technical, but is also modular-which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover. The learning concepts include the following: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science.
#> 119
#> 120
#> 121 Spatio-temporal data are more ubiquitous and richer than even before and the availability of such data poses great challenges in data analytics. Ecological facilitation, the positive effect of density of individuals on the individual's survival across a stress gradient, is a complex phenomenon. A large number of tree individuals coupled with soil moisture, temperature, and water stress data across a long temporal period were followed. Data-driven analysis in the absence of hypothesis was performed. Information theoretic analysis of multiple statistical models was employed in order to quantify the best data-driven index of vegetation density and spatial scale of interactions. Sequentially, tree survival was quantified as a function of the size of the individual, vegetation density, and time at the optimal spatial interaction scale. Land surface temperature and soil moisture were also statistically explained by tree size, density, and time. Results indicated that in space both facilitation and competition co-exist in the same ecosystem and the sign and magnitude of this depend on the spatial scale. Overall, within the optimal data-driven spatial scale, tree survival was best explained by the interaction between density and year, sifting overall from facilitation to competition through time. However, small sized trees were always facilitated by increased densities, while large sized trees had either negative or no density effects. Tree size was more important predictor than density in survival and this has implications for nature-based solutions: maintaining large tree individuals or planting species that can become large-sized can safeguard against tree-less areas by promoting survival at long time periods through harsh environmental conditions. Large trees had also a significant effect in moderating land surface temperature and this effect was higher than the one of vegetation density on temperature.
#> 122 OBJECTIVES: Pertussis is a vaccine-preventable disease. Despite this, it remains a major health problem among children in developing countries and in recent years, has re-emerged and has led to considerable outbreaks. Pertussis surveillance is of paramount importance; however, classical monitoring approaches are plagued by some shortcomings, such as considerable time delay and potential underestimation/underreporting of cases.STUDY DESIGN: This study aims at investigating the possibility of using Google Trends (GT) as an instrument for tracking pertussis outbreaks to see if infodemiology and infoveillance approaches could overcome the previously mentioned issues because they are based on real-time monitoring and tracking of web-related activities.METHODS: In the present study, GT was mined from inception (01 January 2004) to 31 December 2015 in the different European countries. Pertussis was searched using the 'search topic' strategy. Pertussis-related GT figures were correlated with the number of pertussis cases and deaths retrieved from the European Centre for Disease prevention and Control database.RESULTS: At the European countries level, correlation between pertussis cases and GT-based search volumes was very large (ranging from 0.94 to 0.97) from 2004 to 2015. When examining each country, however, only a few reached the threshold of statistical significance.CONCLUSIONS: GT could be particularly useful in pertussis surveillance and control, provided that the algorithm is better adjusted and refined at the country level.
#> 123 Root cause analysis (RCA) is one of the most prominent tools used to comprehensively evaluate a biopharmaceutical production process. Despite of its widespread use in industry, the Food and Drug Administration has observed a lot of unsuitable approaches for RCAs within the last years. The reasons for those unsuitable approaches are the use of incorrect variables during the analysis and the lack in process understanding, which impede correct model interpretation. Two major approaches to perform RCAs are currently dominating the chemical and pharmaceutical industry: raw data analysis and feature-based approach. Both techniques are shown to be able to identify the significant variables causing the variance of the response. Although they are different in data unfolding, the same tools as principal component analysis and partial least square regression are used in both concepts. Within this article we demonstrate the strength and weaknesses of both approaches. We proved that a fusion of both results in a comprehensive and effective workflow, which not only increases better process understanding. We demonstrate this workflow along with an example. Hence, the presented workflow allows to save analysis time and to reduce the effort of data mining by easy detection of the most important variables within the given dataset. Subsequently, the final obtained process knowledge can be translated into new hypotheses, which can be tested experimentally and thereby lead to effectively improving process robustness.
#> 124 Adolescent idiopathic scoliosis (AIS) is a three-dimensional (3D) deformity of the spinal column. For progressive deformities in AIS, the spinal fusion surgery aims to correct and stabilize the deformity; however, common surgical planning approaches based on the 2D X-rays and subjective surgical decision-making have been challenged by poor clinical outcomes. As the suboptimal surgical outcomes can significantly impact the cost, risk of revision surgery, and long-term rehabilitation of adolescent patients, objective patient-specific models that predict the outcome of different treatment scenarios are in high demand. 3D classification of the spinal curvature and identifying the key surgical parameters influencing the outcomes are required for such models. Here, we show that K-means clustering of the isotropically scaled 3D spinal curves provides an effective, data-driven method for classification of patients. We further propose, and evaluate in 67 right thoracic AIS patients, that by knowing the patients' pre-operative and early post-operation clusters and the vertebral levels which were instrumented during the surgery, the two-year outcome cluster can be determined. This framework, once applied to a larger heterogeneous patient dataset, can further isolate the key surgeon-modifiable parameters and eventually lead to a patient-specific predictive model based on a limited number of factors determinable prior to surgery.
#> 125 Using predictive modeling techniques, we developed and compared appointment no-show prediction models to better understand appointment adherence in underserved populations.</AbstractText>: Using predictive modeling techniques, we developed and compared appointment no-show prediction models to better understand appointment adherence in underserved populations.We collected electronic health record (EHR) data and appointment data including patient, provider and clinical visit characteristics over a 3-year period. All patient data came from an urban system of community health centers (CHCs) with 10 facilities. We sought to identify critical variables through logistic regression, artificial neural network, and naïve Bayes classifier models to predict missed appointments. We used 10-fold cross-validation to assess the models' ability to identify patients missing their appointments.</AbstractText>: We collected electronic health record (EHR) data and appointment data including patient, provider and clinical visit characteristics over a 3-year period. All patient data came from an urban system of community health centers (CHCs) with 10 facilities. We sought to identify critical variables through logistic regression, artificial neural network, and naïve Bayes classifier models to predict missed appointments. We used 10-fold cross-validation to assess the models' ability to identify patients missing their appointments.Following data preprocessing and cleaning, the final dataset included 73811 unique appointments with 12,392 missed appointments. Predictors of missed appointments versus attended appointments included lead time (time between scheduling and the appointment), patient prior missed appointments, cell phone ownership, tobacco use and the number of days since last appointment. Models had a relatively high area under the curve for all 3 models (e.g., 0.86 for naïve Bayes classifier).</AbstractText>: Following data preprocessing and cleaning, the final dataset included 73811 unique appointments with 12,392 missed appointments. Predictors of missed appointments versus attended appointments included lead time (time between scheduling and the appointment), patient prior missed appointments, cell phone ownership, tobacco use and the number of days since last appointment. Models had a relatively high area under the curve for all 3 models (e.g., 0.86 for naïve Bayes classifier).Patient appointment adherence varies across clinics within a healthcare system. Data analytics results demonstrate the value of existing clinical and operational data to address important operational and management issues.</AbstractText>: Patient appointment adherence varies across clinics within a healthcare system. Data analytics results demonstrate the value of existing clinical and operational data to address important operational and management issues.EHR data including patient and scheduling information predicted the missed appointments of underserved populations in urban CHCs. Our application of predictive modeling techniques helped prioritize the design and implementation of interventions that may improve efficiency in community health centers for more timely access to care. CHCs would benefit from investing in the technical resources needed to make these data readily available as a means to inform important operational and policy questions.</AbstractText>: EHR data including patient and scheduling information predicted the missed appointments of underserved populations in urban CHCs. Our application of predictive modeling techniques helped prioritize the design and implementation of interventions that may improve efficiency in community health centers for more timely access to care. CHCs would benefit from investing in the technical resources needed to make these data readily available as a means to inform important operational and policy questions.
#> 126 Over a century ago, Abraham Flexner's landmark report on medical education resulted in the most extensive reforms of medical training in history. They led to major advances in the diagnosis and treatment of disease and the relief of suffering. His prediction that "the physician's function is fast becoming social and preventive, rather than individual and curative," however, was never realized.Instead, with the rise of biomedical science, the scientific method and the American Medical Association, the health care system became increasingly distanced from a holistic approach to life that recognizes the critical role social determinants play in people's health. These developments created the beginning of the regulatory controls that have come to define and shape American health care - and our unhealthy obsession with illness, disease and curative medicine that has resulted in a system that has little to do with health.To realize Flexner's prediction, and to transform health care into a holistic system whose primary goals are focused on health outcomes, six disruptive interventions are proposed. First, health needs to be placed in the context of community. Second, the model of primary care needs to be revised. Third, big data need to be harnessed to provide personalized, consumable, and actionable health knowledge. Fourth, there needs to greater patient engagement, but with fewer face-to-face encounters.Fifth, we need revitalized, collaborative medical training for physicians. And finally, true transformation will require market-driven, not regulatory-constrained, innovation. The evolution from health care to health demands consumer-driven choices that only a deregulated, free market can provide.
#> 127 Depressive symptoms may contribute to cocaine use. However, tests of the relationship between depression and severity of cocaine use have produced mixed results, possibly due to heterogeneity in individual symptoms of depression. Our goal was to establish which symptoms of depression are most strongly related to frequency of cocaine use (one aspect of severity) in a large sample of current cocaine users. We utilized generalized additive modeling to provide data-driven exploration of the relationships between depressive symptoms and cocaine use, including examination of non-linearity. We hypothesized that symptoms related to anhedonia would demonstrate the strongest relationship to cocaine use.</AbstractText>: Depressive symptoms may contribute to cocaine use. However, tests of the relationship between depression and severity of cocaine use have produced mixed results, possibly due to heterogeneity in individual symptoms of depression. Our goal was to establish which symptoms of depression are most strongly related to frequency of cocaine use (one aspect of severity) in a large sample of current cocaine users. We utilized generalized additive modeling to provide data-driven exploration of the relationships between depressive symptoms and cocaine use, including examination of non-linearity. We hypothesized that symptoms related to anhedonia would demonstrate the strongest relationship to cocaine use.772 individuals screened for cocaine use disorder treatment studies. To measure depressive symptoms, we used the items of the Beck Depression Inventory, 2nd Edition. Cocaine use frequency was measured as proportion of self-reported days of cocaine use over the last 30 days using the Addiction Severity Index.</AbstractText>: 772 individuals screened for cocaine use disorder treatment studies. To measure depressive symptoms, we used the items of the Beck Depression Inventory, 2nd Edition. Cocaine use frequency was measured as proportion of self-reported days of cocaine use over the last 30 days using the Addiction Severity Index.Models identified 18 significant predictors of past-30-day cocaine use. The strongest predictors were Crying, Pessimism, Changes in Appetite, Indecisiveness, and Loss of Interest. Noteworthy effect sizes were found for specific response options on Suicidal Thoughts, Worthlessness, Agitation, Concentration Difficulty, Tiredness, and Self Dislike items.</AbstractText>: Models identified 18 significant predictors of past-30-day cocaine use. The strongest predictors were Crying, Pessimism, Changes in Appetite, Indecisiveness, and Loss of Interest. Noteworthy effect sizes were found for specific response options on Suicidal Thoughts, Worthlessness, Agitation, Concentration Difficulty, Tiredness, and Self Dislike items.The strongest predictors did not conform to previously hypothesized "subtypes" of depression. Non-linear relationships between items and use were typical, suggesting BDI-II items may not be monotonically increasing ordinal measures with respect to predicting cocaine use. Qualitative analysis of strongly predictive response options suggested emotional volatility and disregard for the future as important predictors of use.</AbstractText>: The strongest predictors did not conform to previously hypothesized "subtypes" of depression. Non-linear relationships between items and use were typical, suggesting BDI-II items may not be monotonically increasing ordinal measures with respect to predicting cocaine use. Qualitative analysis of strongly predictive response options suggested emotional volatility and disregard for the future as important predictors of use.
#> 128 Healthcare organizations have invested significant resources into integrating comprehensive electronic health record (EHR) systems into clinical care. EHRs digitize healthcare in ways that allow for repurposing of clinical information to support quality improvement, research, population health, and health system analytics. This has facilitated the development of Learning Health Systems. Learning health systems (LHS) merge healthcare delivery with research, data science, and quality improvement processes. The LHS cycle begins and ends with the clinician-patient interaction, and aspires to provide continuous improvements in quality, outcomes, and health care efficiency. Although, the health sector has been slow to embrace the LHS concept, innovative approaches for improving healthcare, such as a LHS, have shown that better outcomes can be achieved by engaging patients and physicians in communities committed to a common purpose. Here, we explore the mission of a pediatric LHS, such as PEDSnet, which is driven by the distinctive goals of a child's well-being. Its vision is to create a national LHS architecture in which all pediatric institutions can participate. While challenges still exist in the development and adoption of LHS, these challenges are being met with innovative strategies and strong collaborative relationships to reduce system uncertainty while improving patient outcomes.
#> 129 Demands for constant upgrades to already-installed electronic health record systems are slowing investment in other important digital technologies like telehealth, remote patient monitoring and online billing.
#> 130 Rapid research progress in science and technology (S&T) and continuously shifting workforce needs exert pressure on each other and on the educational and training systems that link them. Higher education institutions aim to equip new generations of students with skills and expertise relevant to workforce participation for decades to come, but their offerings sometimes misalign with commercial needs and new techniques forged at the frontiers of research. Here, we analyze and visualize the dynamic skill (mis-)alignment between academic push, industry pull, and educational offerings, paying special attention to the rapidly emerging areas of data science and data engineering (DS/DE). The visualizations and computational models presented here can help key decision makers understand the evolving structure of skills so that they can craft educational programs that serve workforce needs. Our study uses millions of publications, course syllabi, and job advertisements published between 2010 and 2016. We show how courses mediate between research and jobs. We also discover responsiveness in the academic, educational, and industrial system in how skill demands from industry are as likely to drive skill attention in research as the converse. Finally, we reveal the increasing importance of uniquely human skills, such as communication, negotiation, and persuasion. These skills are currently underexamined in research and undersupplied through education for the labor market. In an increasingly data-driven economy, the demand for "soft" social skills, like teamwork and communication, increase with greater demand for "hard" technical skills and tools.
#> 131 Artificial intelligence and automation are topics dominating global discussions on the future of professional employment, societal change, and economic performance. In this paper, we describe fundamental concepts underlying AI and Big Data and their significance to public health. We highlight issues involved and describe the potential impacts and challenges to medical professionals and diagnosticians. The possible benefits of advanced data analytics and machine learning are described in the context of recently reported research. Problems are identified and discussed with respect to ethical issues and the future roles of professionals and specialists in the age of artificial intelligence.
#> 132 A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.
#> 133 Deaths due to road traffic accidents (RTAs) are a major public health concern around the world. Developing countries are over-represented in these statistics. Punitive measures are traditionally employed to lower RTA related behavioural risk factors. These are, however, resource intensive and require infrastructure development. This is a randomised controlled study to investigate the effect of non-punitive behavioural intervention through peer-comparison feedback based on driver behaviour data gathered by an in-vehicle telematics device.</AbstractText>: Deaths due to road traffic accidents (RTAs) are a major public health concern around the world. Developing countries are over-represented in these statistics. Punitive measures are traditionally employed to lower RTA related behavioural risk factors. These are, however, resource intensive and require infrastructure development. This is a randomised controlled study to investigate the effect of non-punitive behavioural intervention through peer-comparison feedback based on driver behaviour data gathered by an in-vehicle telematics device.A randomised controlled trial using repeated measures design conducted in Iran on the drivers of 112 public transport taxis in Tehran province and 1309 inter-city busses operating nationwide. Driving data is captured by an in-vehicle telematics device and sent to a centrally located data centre using a mobile network. The telematics device is installed in all vehicles. Participants are males aged above 20 who have had the device operating in their vehicles for at least 3 months prior to the start of the trial.</AbstractText>: A randomised controlled trial using repeated measures design conducted in Iran on the drivers of 112 public transport taxis in Tehran province and 1309 inter-city busses operating nationwide. Driving data is captured by an in-vehicle telematics device and sent to a centrally located data centre using a mobile network. The telematics device is installed in all vehicles. Participants are males aged above 20 who have had the device operating in their vehicles for at least 3 months prior to the start of the trial.The study had three stages: 1- Driver performance was monitored for a 4-week period after which they were randomised into intervention and control groups. 2- Their performance was monitored for a 9-week period. At the end of each week, drivers in the intervention group received a scorecard and a note informing them of their weekly behaviour and ranking within their peer group. Drivers in the control group received no feedback via short messaging service (SMS). 3- Drivers did not receive further feedback and their behaviour was monitored for another 4 weeks.</AbstractText>: The study had three stages: 1- Driver performance was monitored for a 4-week period after which they were randomised into intervention and control groups. 2- Their performance was monitored for a 9-week period. At the end of each week, drivers in the intervention group received a scorecard and a note informing them of their weekly behaviour and ranking within their peer group. Drivers in the control group received no feedback via short messaging service (SMS). 3- Drivers did not receive further feedback and their behaviour was monitored for another 4 weeks.Primary outcome was changes in weekly driving score in intervention and control groups during stage 2 of intervention. Taxis and busses were analysed separately using generalised estimating equation analysis.</AbstractText>: Primary outcome was changes in weekly driving score in intervention and control groups during stage 2 of intervention. Taxis and busses were analysed separately using generalised estimating equation analysis.This project was funded by the National Institute for Medical Research Development (Grant No.940576) and approved by its ethics committee (Code: IR.NIMAD.REC.1394.016). This trial was registered at www.irct.ir as IRCT20180708040391N1.</AbstractText>: This project was funded by the National Institute for Medical Research Development (Grant No.940576) and approved by its ethics committee (Code: IR.NIMAD.REC.1394.016). This trial was registered at www.irct.ir as IRCT20180708040391N1.
#> 134 The rising popularity of social media, since their inception around 20 years ago, has been echoed in the growth of health-related research using data derived from them. This has created a demand for literature reviews to synthesise this emerging evidence base and inform future activities. Existing reviews tend to be narrow in scope, with limited consideration of the different types of data, analytical methods and ethical issues involved. There has also been a tendency for research to be siloed within different academic communities (eg, computer science, public health), hindering knowledge translation. To address these limitations, we will undertake a comprehensive scoping review, to systematically capture the broad corpus of published, health-related research based on social media data. Here, we present the review protocol and the pilot analyses used to inform it.</AbstractText>: The rising popularity of social media, since their inception around 20 years ago, has been echoed in the growth of health-related research using data derived from them. This has created a demand for literature reviews to synthesise this emerging evidence base and inform future activities. Existing reviews tend to be narrow in scope, with limited consideration of the different types of data, analytical methods and ethical issues involved. There has also been a tendency for research to be siloed within different academic communities (eg, computer science, public health), hindering knowledge translation. To address these limitations, we will undertake a comprehensive scoping review, to systematically capture the broad corpus of published, health-related research based on social media data. Here, we present the review protocol and the pilot analyses used to inform it.A version of Arksey and O'Malley's five-stage scoping review framework will be followed: (1) identifying the research question; (2) identifying the relevant literature; (3) selecting the studies; (4) charting the data and (5) collating, summarising and reporting the results. To inform the search strategy, we developed an inclusive list of keyword combinations related to social media, health and relevant methodologies. The frequency and variability of terms were charted over time and cross referenced with significant events, such as the advent of Twitter. Five leading health, informatics, business and cross-disciplinary databases will be searched: PubMed, Scopus, Association of Computer Machinery, Institute of Electrical and Electronics Engineers and Applied Social Sciences Index and Abstracts, alongside the Google search engine. There will be no restriction by date.</AbstractText>: A version of Arksey and O'Malley's five-stage scoping review framework will be followed: (1) identifying the research question; (2) identifying the relevant literature; (3) selecting the studies; (4) charting the data and (5) collating, summarising and reporting the results. To inform the search strategy, we developed an inclusive list of keyword combinations related to social media, health and relevant methodologies. The frequency and variability of terms were charted over time and cross referenced with significant events, such as the advent of Twitter. Five leading health, informatics, business and cross-disciplinary databases will be searched: PubMed, Scopus, Association of Computer Machinery, Institute of Electrical and Electronics Engineers and Applied Social Sciences Index and Abstracts, alongside the Google search engine. There will be no restriction by date.The review focuses on published research in the public domain therefore no ethics approval is required. The completed review will be submitted for publication to a peer-reviewed, interdisciplinary open access journal, and conferences on public health and digital research.</AbstractText>: The review focuses on published research in the public domain therefore no ethics approval is required. The completed review will be submitted for publication to a peer-reviewed, interdisciplinary open access journal, and conferences on public health and digital research.
#> 135 OBJECTIVES: To compare information sharing of over 379 health conditions on Twitter to uncover trends and patterns of online user activities.METHODS: We collected 1.5 million tweets generated by over 450,000 Twitter users for 379 health conditions, each of which was quantified using a multivariate model describing engagement, user and content aspects of the data and compared using correlation and network analysis to discover patterns of user activities in these online communities.RESULTS: We found a significant imbalance in terms of the size of communities interested in different health conditions, regardless of the seriousness of these conditions. Improving the informativeness of tweets by using, for example, URLs, multimedia and mentions can be important factors in promoting health conditions on Twitter. Using hashtags on the contrary is less effective. Social network analysis revealed similar structures of the discussion found across different health conditions.CONCLUSIONS: Our study found variance in activity between different health communities on Twitter, and our results are likely to be of interest to public health authorities and officials interested in the potential of Twitter to raise awareness of public health.
#> 136 Nowadays, trendy research in biomedical sciences juxtaposes the term 'precision' to medicine and public health with companion words like big data, data science, and deep learning. Technological advancements permit the collection and merging of large heterogeneous datasets from different sources, from genome sequences to social media posts or from electronic health records to wearables. Additionally, complex algorithms supported by high-performance computing allow one to transform these large datasets into knowledge. Despite such progress, many barriers still exist against achieving precision medicine and precision public health interventions for the benefit of the individual and the population.</AbstractText>: Nowadays, trendy research in biomedical sciences juxtaposes the term 'precision' to medicine and public health with companion words like big data, data science, and deep learning. Technological advancements permit the collection and merging of large heterogeneous datasets from different sources, from genome sequences to social media posts or from electronic health records to wearables. Additionally, complex algorithms supported by high-performance computing allow one to transform these large datasets into knowledge. Despite such progress, many barriers still exist against achieving precision medicine and precision public health interventions for the benefit of the individual and the population.The present work focuses on analyzing both the technical and societal hurdles related to the development of prediction models of health risks, diagnoses and outcomes from integrated biomedical databases. Methodological challenges that need to be addressed include improving semantics of study designs: medical record data are inherently biased, and even the most advanced deep learning's denoising autoencoders cannot overcome the bias if not handled a priori by design. Societal challenges to face include evaluation of ethically actionable risk factors at the individual and population level; for instance, usage of gender, race, or ethnicity as risk modifiers, not as biological variables, could be replaced by modifiable environmental proxies such as lifestyle and dietary habits, household income, or access to educational resources.</AbstractText>: The present work focuses on analyzing both the technical and societal hurdles related to the development of prediction models of health risks, diagnoses and outcomes from integrated biomedical databases. Methodological challenges that need to be addressed include improving semantics of study designs: medical record data are inherently biased, and even the most advanced deep learning's denoising autoencoders cannot overcome the bias if not handled a priori by design. Societal challenges to face include evaluation of ethically actionable risk factors at the individual and population level; for instance, usage of gender, race, or ethnicity as risk modifiers, not as biological variables, could be replaced by modifiable environmental proxies such as lifestyle and dietary habits, household income, or access to educational resources.Data science for precision medicine and public health warrants an informatics-oriented formalization of the study design and interoperability throughout all levels of the knowledge inference process, from the research semantics, to model development, and ultimately to implementation.</AbstractText>: Data science for precision medicine and public health warrants an informatics-oriented formalization of the study design and interoperability throughout all levels of the knowledge inference process, from the research semantics, to model development, and ultimately to implementation.
#> 137 BACKGROUND: Urban form interventions can result in positive and negative impacts on physical activity, social participation, and well-being, and inequities in these outcomes. Natural experiment studies can advance our understanding of causal effects and processes related to urban form interventions. The INTErventions, Research, and Action in Cities Team (INTERACT) is a pan-Canadian collaboration of interdisciplinary scientists, urban planners, and public health decision makers advancing research on the design of healthy and sustainable cities for all. Our objectives are to use natural experiment studies to deliver timely evidence about how urban form interventions influence health, and to develop methods and tools to facilitate such studies going forward.METHODS: INTERACT will evaluate natural experiments in four Canadian cities: the Arbutus Greenway in Vancouver, British Columbia; the All Ages and Abilities Cycling Network in Victoria, BC; a new Bus Rapid Transit system in Saskatoon, Saskatchewan; and components of the Sustainable Development Plan 2016-2020 in Montreal, Quebec, a plan that includes urban form changes initiated by the city and approximately 230 partnering organizations. We will recruit a cohort of between 300 and 3000 adult participants, age 18 or older, in each city and collect data at three time points. Participants will complete health and activity space surveys and provide sensor-based location and physical activity data. We will conduct qualitative interviews with a subsample of participants in each city. Our analysis methods will combine machine learning methods for detecting transportation mode use and physical activity, use temporal Geographic Information Systems to quantify changes to urban intervention exposure, and apply analytic methods for natural experiment studies including interrupted time series analysis.DISCUSSION: INTERACT aims to advance the evidence base on population health intervention research and address challenges related to big data, knowledge mobilization and engagement, ethics, and causality. We will collect ~ 100 TB of sensor data from participants over 5 years. We will address these challenges using interdisciplinary partnerships, training of highly qualified personnel, and modern methodologies for using sensor-based data.
#> 138 This paper contends that a research ethics approach to the regulation of health data research is unhelpful in the era of population-level research and big data because it results in a primary focus on consent (meta-, broad, dynamic and/or specific consent). Two recent guidelines - the 2016 WMA Declaration of Taipei on ethical considerations regarding health databases and biobanks and the revised CIOMS International ethical guidelines for health-related research involving humans - both focus on the growing reliance on health data for research. But as research ethics documents, they remain (to varying degrees) focused on consent and individual control of data use. Many current and future uses of health data make individual consent impractical, if not impossible. Many of the risks of secondary data use apply to communities and stakeholders rather than individual data subjects. Shifting from a research ethics perspective to a public health lens brings a different set of issues into view: how are the benefits and burdens of data use distributed, how can data research empower communities, who has legitimate decision-making capacity? I propose that a public health ethics framework - based on public benefit, proportionality, equity, trust and accountability - provides more appropriate tools for assessing the ethical uses of health data. The main advantage of a public health approach for data research is that it is more likely to foster debate about power, justice and equity and to highlight the complexity of deciding when data use is in the public interest.
#> 139 Large cohort study gained its popularity in biomedical research and demonstrated its application in exploring disease etiology and pathogenesis, improving the prognosis of disease, as well as reducing the burden of diseases. Data science is an interdisciplinary field that uses scientific methods from computer science and statistics to extract insights or knowledge from data in a specific domain. The results from the combination of the two would provide new evidence for developing the strategies and measures on disease prevention and control. This review included a brief introduction of data science, descriptions on characteristics of large cohort data according to the development of the study design, and application of data science at each stage of a large cohort study, as well as prospected the application of data science in the future large cohort studies.
#> 140 The value of using population data to answer important questions for individual and societal benefit has never been greater. Governments and research funders world-wide are recognizing this potential and making major investments in data-intensive initiatives. However, there are challenges to overcome so that safe, socially-acceptable data sharing can be achieved. This paper outlines the field of population data science, the International Population Data Linkage Network (IPDLN), and their roles in advancing data-intensive research. We provide an overview of core concepts and major challenges for data-intensive research, with a particular focus on ethical, legal, and societal implications (ELSI). Using international case studies, we show how challenges can be addressed and lessons learned in advancing the safe, socially-acceptable use of population data for public benefit. Based on the case studies, we discuss the common ELSI principles in operation, we illustrate examples of a data scrutiny panel and a consumer panel, and we propose a set of ELSI-based recommendations to inform new and developing data-intensive initiatives.We conclude that although there are many ELSI issues to be overcome, there has never been a better time or more potential to leverage the benefits of population data for public benefit. A variety of initiatives, with different operating models, have pioneered the way in addressing many challenges. However, the work is not static, as the ELSI environment is constantly evolving, thus requiring continual mutual learning and improvement via the IPDLN and beyond.
#> 141 The value of using population data to answer important questions for individual and societal benefit has never been greater. Governments and research funders world-wide are recognizing this potential and making major investments in data-intensive initiatives. However, there are challenges to overcome so that safe, socially-acceptable data sharing can be achieved. This paper outlines the field of population data science, the International Population Data Linkage Network (IPDLN), and their roles in advancing data-intensive research. We provide an overview of core concepts and major challenges for data-intensive research, with a particular focus on ethical, legal, and societal implications (ELSI). Using international case studies, we show how challenges can be addressed and lessons learned in advancing the safe, socially-acceptable use of population data for public benefit. Based on the case studies, we discuss the common ELSI principles in operation, we illustrate examples of a data scrutiny panel and a consumer panel, and we propose a set of ELSI-based recommendations to inform new and developing data-intensive initiatives.We conclude that although there are many ELSI issues to be overcome, there has never been a better time or more potential to leverage the benefits of population data for public benefit. A variety of initiatives, with different operating models, have pioneered the way in addressing many challenges. However, the work is not static, as the ELSI environment is constantly evolving, thus requiring continual mutual learning and improvement via the IPDLN and beyond.
#> 142 Antimicrobial resistance continues to outpace the development of new chemotherapeutics. Novel pathogens continue to evolve and emerge. Public health innovation has the potential to open a new front in the war of "our wits against their genes" (Joshua Lederberg). Dense sampling coupled to next generation sequencing can increase the spatial and temporal resolution of microbial characterization while sensor technologies precisely map physical parameters relevant to microbial survival and spread. Microbial, physical, and epidemiological big data could be combined to improve prospective risk identification. However, applied in the wrong way, these approaches may not realize their maximum potential benefits and could even do harm. Minimizing microbial-human interactions would be a mistake. There is evidence that microbes previously thought of at best "benign" may actually enhance human health. Benign and health-promoting microbiomes may, or may not, spread via mechanisms similar to pathogens. Infectious vaccines are approaching readiness to make enhanced contributions to herd immunity. The rigorously defined nature of infectious vaccines contrasts with indigenous "benign or health-promoting microbiomes" but they may converge. A "microbial Neolithic revolution" is a possible future in which human microbial-associations are understood and managed analogously to the macro-agriculture of plants and animals. Tradeoffs need to be framed in order to understand health-promoting potentials of benign, and/or health-promoting microbiomes and infectious vaccines while also discouraging pathogens. Super-spreaders are currently defined as individuals who play an outsized role in the contagion of infectious disease. A key unanswered question is whether the super-spreader concept may apply similarly to health-promoting microbes. The complex interactions of individual rights, community health, pathogen contagion, the spread of benign, and of health-promoting microbiomes including infectious vaccines require study. Advancing the detailed understanding of heterogeneity in microbial spread is very likely to yield important insights relevant to public health.
#> 143
#> 144 In the 21st century, public health is not only about fighting infectious diseases, but also contributing to a "multidimensional" well-being of people (health promotion, non-communicable diseases, the role of citizens and people in the health system etc.). Six themes of public health, issues of the 21st century will be addressed. Climate change is already aggravating already existing health risks, heat waves, natural disasters, recrudescence of infectious diseases. Big data is the collection and management of databases characterized by a large volume, a wide variety of data types from various sources and a high speed of generation. Big data permits a better prevention and management of disease in patients, the development of diagnostic support systems and the personalization of treatments. Big data raises important ethical questions. Health literacy includes the abilities of people to assess and critique and appropriate health information. Implementing actions to achieve higher levels of health literacy in populations remains a crucial issue. Since the 2000s, migration flows of health professionals have increased mainly in the "south-north" direction. India is the country with the most doctors outside its borders. The USA and the UK receive 80% of foreign doctors worldwide. Ways have been identified to try to regulate the migratory phenomena of health professionals around the world. The mobilization of citizen, health system users and patient associations is a strong societal characteristic over the last 30 years. In a near future, phenomena will combine to increase the need for accompaniment of patient or citizen to protect health, such increase of the prevalence of chronic diseases, reinforcement of care trajectories, medico-social care pathways, and importance of health determinants. Interventional research in public health is very recent. It is based on experimentation and on the capitalization of field innovations and uses a wide range of scientific disciplines, methods and tools. It is an interesting tool in the arsenal of public health research. It is essential today to be able to identify the multiple challenges that health systems will face in the coming years, to anticipate changes, and to explore possible futures.
#> 145 In the 21st century, public health is not only about fighting infectious diseases, but also contributing to a "multidimensional" well-being of people (health promotion, non-communicable diseases, the role of citizens and people in the health system etc.). Six themes of public health, issues of the 21st century will be addressed. Climate change is already aggravating already existing health risks, heat waves, natural disasters, recrudescence of infectious diseases. Big data is the collection and management of databases characterized by a large volume, a wide variety of data types from various sources and a high speed of generation. Big data permits a better prevention and management of disease in patients, the development of diagnostic support systems and the personalization of treatments. Big data raises important ethical questions. Health literacy includes the abilities of people to assess and critique and appropriate health information. Implementing actions to achieve higher levels of health literacy in populations remains a crucial issue. Since the 2000s, migration flows of health professionals have increased mainly in the "south-north" direction. India is the country with the most doctors outside its borders. The USA and the UK receive 80% of foreign doctors worldwide. Ways have been identified to try to regulate the migratory phenomena of health professionals around the world. The mobilization of citizen, health system users and patient associations is a strong societal characteristic over the last 30 years. In a near future, phenomena will combine to increase the need for accompaniment of patient or citizen to protect health, such increase of the prevalence of chronic diseases, reinforcement of care trajectories, medico-social care pathways, and importance of health determinants. Interventional research in public health is very recent. It is based on experimentation and on the capitalization of field innovations and uses a wide range of scientific disciplines, methods and tools. It is an interesting tool in the arsenal of public health research. It is essential today to be able to identify the multiple challenges that health systems will face in the coming years, to anticipate changes, and to explore possible futures.
#> 146 In the 21st century, public health is not only about fighting infectious diseases, but also contributing to a "multidimensional" well-being of people (health promotion, non-communicable diseases, the role of citizens and people in the health system etc.). Six themes of public health, issues of the 21st century will be addressed. Climate change is already aggravating already existing health risks, heat waves, natural disasters, recrudescence of infectious diseases. Big data is the collection and management of databases characterized by a large volume, a wide variety of data types from various sources and a high speed of generation. Big data permits a better prevention and management of disease in patients, the development of diagnostic support systems and the personalization of treatments. Big data raises important ethical questions. Health literacy includes the abilities of people to assess and critique and appropriate health information. Implementing actions to achieve higher levels of health literacy in populations remains a crucial issue. Since the 2000s, migration flows of health professionals have increased mainly in the "south-north" direction. India is the country with the most doctors outside its borders. The USA and the UK receive 80% of foreign doctors worldwide. Ways have been identified to try to regulate the migratory phenomena of health professionals around the world. The mobilization of citizen, health system users and patient associations is a strong societal characteristic over the last 30 years. In a near future, phenomena will combine to increase the need for accompaniment of patient or citizen to protect health, such increase of the prevalence of chronic diseases, reinforcement of care trajectories, medico-social care pathways, and importance of health determinants. Interventional research in public health is very recent. It is based on experimentation and on the capitalization of field innovations and uses a wide range of scientific disciplines, methods and tools. It is an interesting tool in the arsenal of public health research. It is essential today to be able to identify the multiple challenges that health systems will face in the coming years, to anticipate changes, and to explore possible futures.
#> 147 BACKGROUND & OBJECTIVE: Nutritional culturomics (NCs) is a specific focus area of culturomics epistemology developing digital humanities and computational linguistics approaches to search for macro-patterns of public interest in food, nutrition and diet choice as a major component of cultural evolution. Cultural evolution is considered as a driver at the interface of environmental and food science, economy and policy.METHODS: The paper presents an epistemic programme that builds on the use of big data from webbased services such as Google Trends, Google Adwords or Google Books Ngram Viewer.RESULTS: A comparison of clearly defined NCs in terms of geography, culture, linguistics, literacy, technological setups or time period might be used to reveal variations and singularities in public's behavior in terms of adaptation and mitigation policies in the agri-food and public health sectors.CONCLUSION: The proposed NC programme is developed along major axes: (1) the definition of an NC; (2) the reconstruction of food and diet histories; (3) the nutrition related epidemiology; (4) the understanding of variability of NCs; (5) the methodological diversification of NCs; (6) the quantifiable limitations and flaws of NCs. A series of indicative examples are presented regarding these NC epistemology components.
#> 148 Recent decades have seen dramatic progress in brain research. These advances were often buttressed by probing single variables to make circumscribed discoveries, typically through null hypothesis significance testing. New ways for generating massive data fueled tension between the traditional methodology that is used to infer statistically relevant effects in carefully chosen variables, and pattern-learning algorithms that are used to identify predictive signatures by searching through abundant information. In this article we detail the antagonistic philosophies behind two quantitative approaches: certifying robust effects in understandable variables, and evaluating how accurately a built model can forecast future outcomes. We discourage choosing analytical tools via categories such as 'statistics' or 'machine learning'. Instead, to establish reproducible knowledge about the brain, we advocate prioritizing tools in view of the core motivation of each quantitative analysis: aiming towards mechanistic insight or optimizing predictive accuracy.
#> 149 We report recent progress in the development of a precision test for individualized use of the VEGF-A targeting drug bevacizumab for treating ovarian cancer. We discuss the discovery model stage (i.e., past feasibility modeling and before conversion to the production test). Main results: (a) Informatics modeling plays a critical role in supporting driving clinical and health economic requirements. (b) The novel computational models support the creation of a precision test with sufficient predictivity to reduce healthcare system costs up to $30 billion over 10 years, and make the use of bevacizumab affordable without loss of length or quality of life.
#> 150 PURPOSE: Our primary goal was to study the use of outpatient attendances by lung cancer patients in Hospital Universitario Puerta de Hierro Majadahonda (HUPHM), Spain, by leveraging our Electronic Patient Record (EPR) and structured clinical registry of lung cancer cases as well as assessing current Data Science methods and tools.PATIENTS: We applied the Cross-Industry Standard Process for Data Mining (CRISP-DM) to integrate and analyze activity data extracted from the EPR (9.3 million records) and clinical data of lung cancer patients from a previous registry that was curated into a new, structured database based on REDCap. We have described and quantified factors with an influence in outpatient care use from univariate and multivariate points of view (through Poisson and negative binomial regression).RESULTS: Three cycles of CRISP-DM were performed resulting in a curated database of 522 lung cancer patients with 133 variables which generated 43,197 outpatient visits and tests, 1538 ER visits and 753 inpatient admissions. Stage and ECOG-PS at diagnosis and Charlson Comorbidity Index were major contributors to healthcare use. We also found that the patients' pattern of healthcare use (even before diagnosis), the existence of a history of cancer in first-grade relatives, smoking habits, or even age at diagnosis, could play a relevant role.CONCLUSIONS: Integrating activity data from EPR and clinical structured data from lung cancer patients and applying CRISP-DM has allowed us to describe healthcare use in connection with clinical variables that could be used to plan resources and improve quality of care.
#> 151 INTRODUCTION: The first computerised national ranking exam (cNRE) in Medicine was introduced in June 2016 for 8214 students. It was made of 18 progressive clinical cases (PCCs) with multiple choice questions (MCQs), 120 independent MCQs and 2 scientific articles to criticize. A lack of mark discrimination grounded the cNRE reform. We aimed to assess the discrimination of the final marks after this first cNRE.RESULTS: The national distribution sigmoid curve of the marks is superimposable with previous NRE in 2015. In PCCs, 72% of students were ranked in 1090 points out of 7560 (14%). In independents MCQs, 73% of students were ranked in 434 points out of 2160 (20%). In critical analysis of articles, 75% of students were ranked in 225 points out of 1080 (21%). The above percentages of students are on the plateau of each discrimination curve for PCCs, independent MCQs and critical analysis of scientific articles.CONCLUSION: The cNRE reduced equally-ranked students compared to 2015, with a mean deviation between two papers of 0.28 in 2016 vs 0.04 in 2015. Despite the new format introduced by the cNRE, 75% of students are still ranked in a low proportion of points that is equivalent to previous NRE in 2015 (between 15 et 20% of points).
#> 152 Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.
#> 153 BACKGROUND: New occupational hazards and risks are emerging in our progressively globalized society, in which ageing, migration, wild urbanization and rapid economic growth have led to unprecedented biological, chemical and physical exposures, linked to novel technologies, products and duty cycles. A focus shift from worker health to worker/citizen and community health is crucial. One of the major revolutions of the last decades is the computerization and digitization of the work process, the so-called "work 4.0", and of the workplace.OBJECTIVES: To explore the roles and implications of Big Data in the new occupational medicine settings.METHODS: Comprehensive literature search.RESULTS: Big Data are characterized by volume, variety, veracity, velocity, and value. They come both from wet-lab techniques ("molecular Big Data") and computational infrastructures, including databases, sensors and smart devices ("computational Big Data" and "digital Big Data").CONCLUSIONS: In the light of novel hazards and thanks to new analytical approaches, molecular and digital underpinnings become extremely important in occupational medicine. Computational and digital tools can enable us to uncover new relationships between exposures and work-related diseases; to monitor the public reaction to novel risk factors associated to occupational diseases; to identify exposure-related changes in disease natural history; and to evaluate preventive workplace practices and legislative measures adopted for workplace health and safety.
#> 154 BACKGROUND: The analysis of gene expression levels is used in many clinical studies to know how patients evolve or to find new genetic biomarkers that could help in clinical decision making. However, the techniques and software available for these analyses are not intended for physicians, but for geneticists. However, enabling physicians to make initial discoveries on these data would benefit in the clinical assay development.RESULTS: Melanoma is a highly immunogenic tumor. Therefore, in recent years physicians have incorporated immune system altering drugs into their therapeutic arsenal against this disease, revolutionizing the treatment of patients with an advanced stage of the cancer. This has led us to explore and deepen our knowledge of the immunology surrounding melanoma, in order to optimize the approach. Within this project we have developed a database for collecting relevant clinical information for melanoma patients, including the storage of patient gene expression levels obtained from the NanoString platform (several samples are taken from each patient). The Immune Profiling Panel is used in this case. This database is being exploited through the analysis of the different expression profiles of the patients. This analysis is being done with Python, and a parallel version of the algorithms is available with Apache Spark to provide scalability as needed.CONCLUSIONS: VIGLA-M, the visual analysis tool for gene expression levels in melanoma patients is available at http://khaos.uma.es/melanoma/ . The platform with real clinical data can be accessed with a demo user account, physician, using password physician_test_7634 (if you encounter any problems, contact us at this email address: mailto: khaos@lcc.uma.es). The initial results of the analysis of gene expression levels using these tools are providing first insights into the patients' evolution. These results are promising, but larger scale tests must be developed once new patients have been sequenced, to discover new genetic biomarkers.
#> 155 The moulding together of artificial intelligence (AI) and the geographic/geographic information systems (GIS) dimension creates GeoAI. There is an emerging role for GeoAI in health and healthcare, as location is an integral part of both population and individual health. This article provides an overview of GeoAI technologies (methods, tools and software), and their current and potential applications in several disciplines within public health, precision medicine, and Internet of Things-powered smart healthy cities. The potential challenges currently facing GeoAI research and applications in health and healthcare are also briefly discussed.
#> 156 Purpose Describe how Ohio and Massachusetts explored severe maternal morbidity (SMM) data, and used these data for increasing awareness and driving practice changes to reduce maternal morbidity and mortality. Description For 2008-2013, Ohio used de-identified hospital discharge records and International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes to identify delivery hospitalizations. Massachusetts used existing linked data system infrastructure to identify delivery hospitalizations from birth certificates linked to hospital discharge records. To identify delivery hospitalizations complicated by one or more of 25 SMMs, both states applied an algorithm of ICD-9-CM diagnosis and procedure codes. Ohio calculated a 2013 SMM rate of 144 per 10,000 delivery hospitalizations; Massachusetts calculated a rate of 162. Ohio observed no increase in the SMM rate from 2008 to 2013; Massachusetts observed a 33% increase. Both identified disparities in SMM rates by maternal race, age, and insurance type. Assessment Ohio and Massachusetts engaged stakeholders, including perinatal quality collaboratives and maternal mortality review committees, to share results and raise awareness about the SMM rates and identified high-risk populations. Both states are applying findings to inform strategies for improving perinatal outcomes, such as simulation training for obstetrical emergencies, licensure rules for maternity units, and a focus on health equity. Conclusion Despite data access differences, examination of SMM data informed public health practice in both states. Ohio and Massachusetts maximized available state data for SMM investigation, which other states might similarly use to understand trends, identify high risk populations, and suggest clinical or population level interventions to improve maternal morbidity and mortality.
#> 157
#> 158 Big data and predictive analytics have immense potential to improve risk stratification, particularly in data-rich fields like oncology. This article reviews the literature published on use cases and challenges in applying predictive analytics to improve risk stratification in oncology. We characterized evidence-based use cases of predictive analytics in oncology into three distinct fields: (1) population health management, (2) radiomics, and (3) pathology. We then highlight promising future use cases of predictive analytics in clinical decision support and genomic risk stratification. We conclude by describing challenges in the future applications of big data in oncology, namely (1) difficulties in acquisition of comprehensive data and endpoints, (2) the lack of prospective validation of predictive tools, and (3) the risk of automating bias in observational datasets. If such challenges can be overcome, computational techniques for clinical risk stratification will in short order improve clinical risk stratification for patients with cancer.
#> 159 With regard to fully harvesting the potential of big data, public health lags behind other fields. To determine this potential, we applied big data (air passenger volume from international areas with active chikungunya transmission, Twitter data, and vectorial capacity estimates of Aedes albopictus mosquitoes) to the 2017 chikungunya outbreaks in Europe to assess the risks for virus transmission, virus importation, and short-range dispersion from the outbreak foci. We found that indicators based on voluminous and velocious data can help identify virus dispersion from outbreak foci and that vector abundance and vectorial capacity estimates can provide information on local climate suitability for mosquitoborne outbreaks. In contrast, more established indicators based on Wikipedia and Google Trends search strings were less timely. We found that a combination of novel and disparate datasets can be used in real time to prevent and control emerging and reemerging infectious diseases.
#> 160 Healthcare is a living system that generates a significant volume of heterogeneous data. As healthcare systems are pivoting to value-based systems, intelligent and interactive analysis of health data is gaining significance for health system management, especially for resource optimization whilst improving care quality and health outcomes. Health data analytics is being influenced by new concepts and intelligent methods emanating from artificial intelligence and big data. In this article, we contextualize health data and health data analytics in terms of the emerging trends of artificial intelligence and big data. We examine the nature of health data using the big data criterion to understand "how big" is health data. Next, we explain the working of artificial intelligence-based data analytics methods and discuss "what insights" can be derived from a broad spectrum of health data analytics methods to improve health system management, health outcomes, knowledge discovery, and healthcare innovation.
#> 161 Psychological sciences have identified a wealth of cognitive processes and behavioral phenomena, yet struggle to produce cumulative knowledge. Progress is hamstrung by siloed scientific traditions and a focus on explanation over prediction, two issues that are particularly damaging for the study of multifaceted constructs like self-regulation. Here, we derive a psychological ontology from a study of individual differences across a broad range of behavioral tasks, self-report surveys, and self-reported real-world outcomes associated with self-regulation. Though both tasks and surveys putatively measure self-regulation, they show little empirical relationship. Within tasks and surveys, however, the ontology identifies reliable individual traits and reveals opportunities for theoretic synthesis. We then evaluate predictive power of the psychological measurements and find that while surveys modestly and heterogeneously predict real-world outcomes, tasks largely do not. We conclude that self-regulation lacks coherence as a construct, and that data-driven ontologies lay the groundwork for a cumulative psychological science.
#> 162
#> 163 Nurse leaders are dually responsible for resource stewardship and the delivery of high-quality care. However, methods to identify patient risk for hospital-acquired conditions are often outdated and crude. Although hospitals and health systems have begun to use data science and artificial intelligence in physician-led projects, these innovative methods have not seen adoption in nursing. We propose the Petri dish model, a theoretical hybrid model, which combines population ecology theory and human factors theory to explain the cost/benefit dynamics influencing the slow adoption of data science for hospital-based nursing. The proliferation of nurse-led data science in health systems may be facing several barriers: a scarcity of doctorally prepared nurse scientists with expertise in data science; internal structural inertia; an unaligned national "precision health" strategy; and a federal reimbursement landscape, which constrains-but does not negate the hard dollar business case. Nurse executives have several options: deferring adoption, outsourcing services, and investing in internal infrastructure to develop and implement risk models. The latter offers the best performing models. Progress in nurse-led data science work has been sluggish. Balanced partnerships with physician experts and organizational stakeholders are needed, as is a balanced PhD-DNP research-practice collaboration model.
#> 164 Thailand's transition to high middle-income country status has been accompanied by demographic changes and associated shifts in the nation's public health challenges. These changes have necessitated a significant shift in public health focus from the treatment of infectious diseases to the more expensive and protracted management of non-communicable diseases (NCDs) in older adults.In 2010, in response to this shift in focus, the University of Michigan and colleagues at the Praboromarajchanok Institute for Health Workforce Development in Thailand began work on a broad-based multi-institutional programme for NCD research capacity-building in Thailand.To begin to build a base of intervention research we paired our programme's funded Thai postdoctoral fellows with United States mentors who have strong programmes of intervention research. One direct impact of the programme was the development of research 'hubs' focused upon similar areas of investigative focus such as self-management of cancer symptoms, self-management of HIV/AIDS and health technology information applications for use in community settings. Within these hubs, interventions with proven efficacy in the United States were used as a foundation for culturally relevant interventions in Thailand. The programme also aimed to develop the research support structures necessary within departments and colleges for grant writing and management, dissemination of new knowledge, and ethical conduct of human subject research.In an effort to capitalise on large national health datasets and big data now available in Thailand, several of the programme's postdoctoral fellows began projects that use data science methods to mine this asset. The investigators involved in these ground-breaking projects form the core of a network of research hubs that will be able to capitalise on the availability of lifespan health data from across Thailand and provide a robust working foundation for expansion of research using data science approaches.Going forward, it is vitally important to leverage this groundwork in order to continue fostering rapid growth in NCD research and training as well as to capitalise upon these early gains to create a sustaining influence for Thailand to lead in NCD research, improve the health of its citizens, and provide ongoing leadership in Southeast Asia.
#> 165 OBJECTIVES: The objective of this study is to explore facilitating factors for collaboration at hackathons, intensive events bringing together data scientists ('hackers') with experts in particular subject areas.STUDY DESIGN: This is a qualitative study.METHODS: Semistructured interviews were conducted with organisers before and after the event. The initial exploratory interviews influenced the content of questionnaires which were distributed to all participants asking about their motivations and experiences. Thematic analysis was used to explore key features of collaboration.RESULTS: Facilitating factors were clustered under the themes of preparation (the right amount of pre-event information, methods to maximise attendance and identification of suitable challenges), participants (enough people to progress and a mixture of skills and experience), working together (mutual understanding of the aim, getting the best out of each other, overcoming challenges together, effective facilitation and an enjoyable and valuable experience) and follow-up (recognised process for feedback and support for the development of prototypes).CONCLUSIONS: The findings of the study provide insight into fostering collaboration in this context and provide evidence that may be used to tailor future events for the effective delivery of technological and marketing-based solutions to public health challenges. Hackathons provide a methodological advance with potential for broad public health application.
#> 166 OBJECTIVES: The objective of this study is to explore facilitating factors for collaboration at hackathons, intensive events bringing together data scientists ('hackers') with experts in particular subject areas.STUDY DESIGN: This is a qualitative study.METHODS: Semistructured interviews were conducted with organisers before and after the event. The initial exploratory interviews influenced the content of questionnaires which were distributed to all participants asking about their motivations and experiences. Thematic analysis was used to explore key features of collaboration.RESULTS: Facilitating factors were clustered under the themes of preparation (the right amount of pre-event information, methods to maximise attendance and identification of suitable challenges), participants (enough people to progress and a mixture of skills and experience), working together (mutual understanding of the aim, getting the best out of each other, overcoming challenges together, effective facilitation and an enjoyable and valuable experience) and follow-up (recognised process for feedback and support for the development of prototypes).CONCLUSIONS: The findings of the study provide insight into fostering collaboration in this context and provide evidence that may be used to tailor future events for the effective delivery of technological and marketing-based solutions to public health challenges. Hackathons provide a methodological advance with potential for broad public health application.
#> 167 As public health and health care increase focus toward addressing social determinants of health (SDH), the growth of data and analytics affords new, impactful tools for data-informed community health improvement. Best practices should be established for responsible use, meaningful interpretation, and actionable implementation of SDH data for community health improvement.
#> 168 As public health and health care increase focus toward addressing social determinants of health (SDH), the growth of data and analytics affords new, impactful tools for data-informed community health improvement. Best practices should be established for responsible use, meaningful interpretation, and actionable implementation of SDH data for community health improvement.
#> 169
#> 170 There is great interest in and excitement about the concept of personalized or precision medicine and, in particular, advancing this vision via various 'big data' efforts. While these methods are necessary, they are insufficient to achieve the full personalized medicine promise. A rigorous, complementary 'small data' paradigm that can function both autonomously from and in collaboration with big data is also needed. By 'small data' we build on Estrin's formulation and refer to the rigorous use of data by and for a specific N-of-1 unit (i.e., a single person, clinic, hospital, healthcare system, community, city, etc.) to facilitate improved individual-level description, prediction and, ultimately, control for that specific unit.</AbstractText>: There is great interest in and excitement about the concept of personalized or precision medicine and, in particular, advancing this vision via various 'big data' efforts. While these methods are necessary, they are insufficient to achieve the full personalized medicine promise. A rigorous, complementary 'small data' paradigm that can function both autonomously from and in collaboration with big data is also needed. By 'small data' we build on Estrin's formulation and refer to the rigorous use of data by and for a specific N-of-1 unit (i.e., a single person, clinic, hospital, healthcare system, community, city, etc.) to facilitate improved individual-level description, prediction and, ultimately, control for that specific unit.The purpose of this piece is to articulate why a small data paradigm is needed and is valuable in itself, and to provide initial directions for future work that can advance study designs and data analytic techniques for a small data approach to precision health. Scientifically, the central value of a small data approach is that it can uniquely manage complex, dynamic, multi-causal, idiosyncratically manifesting phenomena, such as chronic diseases, in comparison to big data. Beyond this, a small data approach better aligns the goals of science and practice, which can result in more rapid agile learning with less data. There is also, feasibly, a unique pathway towards transportable knowledge from a small data approach, which is complementary to a big data approach. Future work should (1) further refine appropriate methods for a small data approach; (2) advance strategies for better integrating a small data approach into real-world practices; and (3) advance ways of actively integrating the strengths and limitations from both small and big data approaches into a unified scientific knowledge base that is linked via a robust science of causality.</AbstractText>: The purpose of this piece is to articulate why a small data paradigm is needed and is valuable in itself, and to provide initial directions for future work that can advance study designs and data analytic techniques for a small data approach to precision health. Scientifically, the central value of a small data approach is that it can uniquely manage complex, dynamic, multi-causal, idiosyncratically manifesting phenomena, such as chronic diseases, in comparison to big data. Beyond this, a small data approach better aligns the goals of science and practice, which can result in more rapid agile learning with less data. There is also, feasibly, a unique pathway towards transportable knowledge from a small data approach, which is complementary to a big data approach. Future work should (1) further refine appropriate methods for a small data approach; (2) advance strategies for better integrating a small data approach into real-world practices; and (3) advance ways of actively integrating the strengths and limitations from both small and big data approaches into a unified scientific knowledge base that is linked via a robust science of causality.Small data is valuable in its own right. That said, small and big data paradigms can and should be combined via a foundational science of causality. With these approaches combined, the vision of precision health can be achieved.</AbstractText>: Small data is valuable in its own right. That said, small and big data paradigms can and should be combined via a foundational science of causality. With these approaches combined, the vision of precision health can be achieved.
#> 171
#> 172 Big data analytics enables large-scale data sets integration, supporting people management decisions, and cost-effectiveness evaluation of healthcare organizations. The purpose of this article is to address the decision-making process based on big data analytics in Healthcare organizations, to identify main big data analytics able to support healthcare leaders' decisions and to present some strategies to enhance efficiency along the healthcare value chain. Our research was based on a systematic review. During the literature review, we will be presenting as well the different applications of big data in the healthcare context and a proposal for a predictive model for people management processes. Our research underlines the importance big data analytics can add to the efficiency of the decision-making process, through a predictive model and real-time analytics, assisting in the collection, management, and integration of data in healthcare organizations.
#> 173 Ending the HIV Epidemic: A Plan for America" (EtHE), launched by the Department of Health and Human Services (DHHS), is predicated on actionable data systems to monitor progress toward ambitious goals and to guide human immunodeficiency virus (HIV) testing, prevention, and treatment services. Situated on a status-neutral continuum of HIV prevention and care, EtHE relies on coordination across DHHS agencies and utilization of data systems established for programmatic purposes. Improving efficiencies and timeliness of existing data systems and harnessing the potential of novel data systems, including those afforded by social media, require big data science approaches and investment in technological and human resources.
#> 174 The goal of the Precision in Symptom Self-Management (PriSSM) Center is to advance the science of symptom self-management for Latinos through a social ecological lens that takes into account variability in individual, interpersonal, organizational, and environmental factors across the life course. Informatics and data science methods are foundational to PriSSM's research activities including its pilot studies and research resources. This work highlights three areas: Latino Data Repository, Information Visualization, and Center Evaluation.
#> 175 Air pollution has emerged as one of the world's largest environmental health threats, with various studies demonstrating associations between exposure to air pollution and respiratory and cardiovascular diseases. Regional air quality in Southeast Asia has been seasonally affected by the transboundary haze problem, which has often been the result of forest fires from "slash-and-burn" farming methods. In light of growing public health concerns, recent studies have begun to examine the health effects of this seasonal haze problem in Southeast Asia. This review paper aims to synthesize current research efforts on the impact of the Southeast Asian transboundary haze on acute aspects of public health. Existing studies conducted in countries affected by transboundary haze indicate consistent links between haze exposure and acute psychological, respiratory, cardiovascular, and neurological morbidity and mortality. Future prospective and longitudinal studies are warranted to quantify the long-term health effects of recurrent, but intermittent, exposure to high levels of seasonal haze. The mechanism, toxicology and pathophysiology by which these toxic particles contribute to disease and mortality should be further investigated. Epidemiological studies on the disease burden and socioeconomic cost of haze exposure would also be useful to guide policy-making and international strategy in minimizing the impact of seasonal haze in Southeast Asia.
#> 176 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval. The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up. These datasets are large, including thousands of patient records spanning multiple years of observation, and representative of real-world clinical practice. Thus, one of the main advantages is the possibility to study the real-world safety and effectiveness of medications in uncontrolled environments. Due to the large size (volume), structure (variety), and availability (velocity) of observational healthcare databases there is a large interest in the application of natural language processing and machine learning, including the development of novel models to detect drug-drug interactions, patient phenotypes, and outcome prediction. This report will provide an overview of the current challenges in pharmacoepidemiology and where machine learning applications may be useful for filling the gap.
#> journal
#> 1 Medicine and health, Rhode Island
#> 2 Medicine and health, Rhode Island
#> 3 Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology
#> 4 The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics
#> 5 The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics
#> 6 Clinical journal of sport medicine : official journal of the Canadian Academy of Sport Medicine
#> 7 Preventive medicine
#> 8 Journal of the California Dental Association
#> 9 Chest
#> 10 JAMA dermatology
#> 11 Journal of public health management and practice : JPHMP
#> 12 Science (New York, N.Y.)
#> 13 Journal of medical Internet research
#> 14 Science (New York, N.Y.)
#> 15 PLoS computational biology
#> 16 PLoS computational biology
#> 17 Public health reports (Washington, D.C. : 1974)
#> 18 Public health
#> 19 Epidemiology (Cambridge, Mass.)
#> 20 Bioscience trends
#> 21 Journal of medical Internet research
#> 22 Annals of epidemiology
#> 23 Journal of neurotrauma
#> 24 American journal of public health
#> 25 Zhonghua yu fang yi xue za zhi [Chinese journal of preventive medicine]
#> 26 Computers in biology and medicine
#> 27 American journal of epidemiology
#> 28 American journal of epidemiology
#> 29 Gaceta sanitaria
#> 30 Gaceta sanitaria
#> 31 Duodecim; laaketieteellinen aikakauskirja
#> 32 International journal of environmental research and public health
#> 33 AMIA ... Annual Symposium proceedings. AMIA Symposium
#> 34 Giornale italiano di cardiologia (2006)
#> 35 Journal of the American Medical Informatics Association : JAMIA
#> 36 Annals of epidemiology
#> 37 Applied clinical informatics
#> 38 Medical care
#> 39 Journal of health politics, policy and law
#> 40 Journal of medical systems
#> 41 Yearbook of medical informatics
#> 42 Genetic epidemiology
#> 43 Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
#> 44 Journal of medical Internet research
#> 45 Clinical rheumatology
#> 46 Annals of epidemiology
#> 47 Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference
#> 48 Journal of evidence-based medicine
#> 49 Microbial genomics
#> 50 AIDS and behavior
#> 51 Scientific reports
#> 52 Journal of epidemiology and community health
#> 53 Environmental science & technology
#> 54 Annals of epidemiology
#> 55 Studies in health technology and informatics
#> 56 Journal of clinical monitoring and computing
#> 57 The Journal of infectious diseases
#> 58 International journal of environmental research and public health
#> 59 Journal of epidemiology and community health
#> 60 Briefings in bioinformatics
#> 61 JAMA
#> 62 Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany))
#> 63 Journal of the American Medical Informatics Association : JAMIA
#> 64 Cadernos de saude publica
#> 65 Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany))
#> 66 PLoS neglected tropical diseases
#> 67 Journal of healthcare engineering
#> 68 Advances in experimental medicine and biology
#> 69 Human genomics
#> 70 Annals of global health
#> 71 Duodecim; laaketieteellinen aikakauskirja
#> 72 Annual review of public health
#> 73 Clinical nursing research
#> 74 Journal of neuroscience methods
#> 75 Drug discovery today
#> 76 Journal of public health policy
#> 77 Journal of public health policy
#> 78 Journal of public health policy
#> 79 Journal of public health policy
#> 80 Journal of public health policy
#> 81 Journal of public health policy
#> 82 Journal of public health policy
#> 83 AIDS and behavior
#> 84 Health & place
#> 85 BMJ open
#> 86 Nursing research
#> 87 Nature
#> 88 Puerto Rico health sciences journal
#> 89 JAMA oncology
#> 90 Scientific reports
#> 91 Environmental pollution (Barking, Essex : 1987)
#> 92 Journal of biomedical informatics
#> 93 Health affairs (Project Hope)
#> 94 Breast cancer research and treatment
#> 95 Clinical and translational science
#> 96 Journal of forensic and legal medicine
#> 97 Life sciences, society and policy
#> 98 Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin
#> 99 International journal for equity in health
#> 100 Biological psychiatry. Cognitive neuroscience and neuroimaging
#> 101 The International journal on drug policy
#> 102 The International journal on drug policy
#> 103 Academic emergency medicine : official journal of the Society for Academic Emergency Medicine
#> 104 Future oncology (London, England)
#> 105 Psychiatry research
#> 106 American journal of public health
#> 107 Developing world bioethics
#> 108 Ethnicity & disease
#> 109 Applied clinical informatics
#> 110 The Journal of law, medicine & ethics : a journal of the American Society of Law, Medicine & Ethics
#> 111 Yearbook of medical informatics
#> 112 Telemedicine journal and e-health : the official journal of the American Telemedicine Association
#> 113 Nature
#> 114 The lancet. HIV
#> 115 Journal of the Royal Society, Interface
#> 116 Current opinion in structural biology
#> 117 Big data
#> 118 Big data
#> 119 Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi
#> 120 Annals of surgical oncology
#> 121 The Science of the total environment
#> 122 Public health
#> 123 Bioprocess and biosystems engineering
#> 124 Scientific reports
#> 125 Journal of primary care & community health
#> 126 Military medicine
#> 127 Drug and alcohol dependence
#> 128 Seminars in pediatric surgery
#> 129 Modern healthcare
#> 130 Proceedings of the National Academy of Sciences of the United States of America
#> 131 International journal of environmental research and public health
#> 132 Methods in molecular biology (Clifton, N.J.)
#> 133 Archives of Iranian medicine
#> 134 BMJ open
#> 135 International journal of public health
#> 136 BMC medical informatics and decision making
#> 137 BMC public health
#> 138 Bioethics
#> 139 Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi
#> 140 Epidemiology and health
#> 141 Epidemiology and health
#> 142 BMC infectious diseases
#> 143 Diagnostic microbiology and infectious disease
#> 144 La Tunisie medicale
#> 145 La Tunisie medicale
#> 146 La Tunisie medicale
#> 147 Current pharmaceutical biotechnology
#> 148 Trends in neurosciences
#> 149 AMIA ... Annual Symposium proceedings. AMIA Symposium
#> 150 Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico
#> 151 La Revue de medecine interne
#> 152 PloS one
#> 153 La Medicina del lavoro
#> 154 BMC bioinformatics
#> 155 International journal of health geographics
#> 156 Maternal and child health journal
#> 157 The Science of the total environment
#> 158 American Society of Clinical Oncology educational book. American Society of Clinical Oncology. Annual Meeting
#> 159 Emerging infectious diseases
#> 160 Healthcare management forum
#> 161 Nature communications
#> 162 International journal of dermatology
#> 163 Nursing administration quarterly
#> 164 Health research policy and systems
#> 165 Public health
#> 166 Public health
#> 167 North Carolina medical journal
#> 168 North Carolina medical journal
#> 169 Health care management science
#> 170 BMC medicine
#> 171 Environmental pollution (Barking, Essex : 1987)
#> 172 Journal of medical systems
#> 173 Infectious disease clinics of North America
#> 174 Studies in health technology and informatics
#> 175 International journal of environmental research and public health
#> 176 Chimia
#> year keyword
#> 1 2004 Public Health Informatics
#> 2 2004 Public Health Practice
#> 3 2013 Public Health
#> 4 2013 Public Health
#> 5 2013 Public Health Practice
#> 6 2013 Public Health
#> 7 2014 Public Health
#> 8 2014 Public Health
#> 9 2014 Public Health
#> 10 2014 Public Health
#> 11 2014 Public Health
#> 12 2014 Public Health
#> 13 2014 Public Health Surveillance
#> 14 2015 Public Health
#> 15 2015 Public Health
#> 16 2015 Public Health
#> 17 2015 Public Health
#> 18 2015 Public Health
#> 19 2015 Public Health
#> 20 2015 Public Health
#> 21 2015 Public Health Surveillance
#> 22 2015 Public Health
#> 23 2015 Public Health
#> 24 2015 Public Health Administration
#> 25 2015 Public Health
#> 26 2015 Public Health
#> 27 2015 Public Health
#> 28 2015 Schools, Public Health
#> 29 2015 Public Health
#> 30 2015 Public Health
#> 31 2016 Public Health
#> 32 2016 Public Health
#> 33 2015 Public Health
#> 34 2016 Public Health
#> 35 2016 Public Health
#> 36 2016 Public Health
#> 37 2016 Public Health
#> 38 2016 Public Health
#> 39 2016 Public Health
#> 40 2016 Public Health
#> 41 2016 Public Health
#> 42 2016 Public Health
#> 43 2016 Public Health
#> 44 2016 Public Health
#> 45 2016 Public Health
#> 46 2017 Public Health
#> 47 2017 Public Health
#> 48 2017 Public Health
#> 49 2017 Public Health
#> 50 2017 Public Health
#> 51 2017 Public Health Surveillance
#> 52 2017 Public Health
#> 53 2017 Public Health
#> 54 2017 Public Health
#> 55 2017 Public Health
#> 56 2017 Data Science
#> 57 2017 Public Health Administration
#> 58 2017 Public Health
#> 59 2017 Public Health
#> 60 2017 Data Science
#> 61 2017 Public Health
#> 62 2017 Public Health
#> 63 2017 Data Science
#> 64 2017 Public Health
#> 65 2017 Public Health
#> 66 2017 Public Health
#> 67 2017 Data Science
#> 68 2017 Public Health
#> 69 2017 Public Health
#> 70 2017 Public Health
#> 71 2017 Public Health
#> 72 2017 Public Health
#> 73 2017 Data Science
#> 74 2017 Data Science
#> 75 2018 Public Health
#> 76 2018 Public Health
#> 77 2018 Public Health
#> 78 2018 Public Health
#> 79 2018 Public Health
#> 80 2018 Public Health
#> 81 2018 Public Health
#> 82 2018 Public Health
#> 83 2018 Public Health
#> 84 2018 Public Health
#> 85 2018 Public Health
#> 86 2018 Data Science
#> 87 2018 Public Health
#> 88 2018 Data Science
#> 89 2018 Data Science
#> 90 2018 Data Science
#> 91 2018 Public Health
#> 92 2018 Data Science
#> 93 2018 Public Health
#> 94 2018 Public Health Surveillance
#> 95 2018 Data Science
#> 96 2018 Public Health
#> 97 2018 Public Health Informatics
#> 98 2018 Public Health Surveillance
#> 99 2018 Public Health
#> 100 2018 Data Science
#> 101 2018 Data Science
#> 102 2018 Public Health
#> 103 2018 Data Science
#> 104 2018 Public Health Surveillance
#> 105 2018 Data Science
#> 106 2018 Public Health Surveillance
#> 107 2018 Public Health
#> 108 2018 Public Health
#> 109 2018 Data Science
#> 110 2018 Public Health
#> 111 2018 Public Health Informatics
#> 112 2018 Public Health Administration
#> 113 2018 Public Health
#> 114 2018 Public Health
#> 115 2018 Data Science
#> 116 2018 Data Science
#> 117 2018 Data Science
#> 118 2018 Data Science
#> 119 2018 Public Health
#> 120 2018 Data Science
#> 121 2018 Data Science
#> 122 2018 Public Health Surveillance
#> 123 2018 Data Science
#> 124 2018 Data Science
#> 125 2018 Data Science
#> 126 2018 Public Health
#> 127 2018 Data Science
#> 128 2018 Data Science
#> 129 2016 Data Science
#> 130 2018 Data Science
#> 131 2018 Public Health
#> 132 2018 Data Science
#> 133 2018 Public Health
#> 134 2018 Public Health
#> 135 2018 Public Health
#> 136 2018 Public Health
#> 137 2019 Public Health
#> 138 2019 Public Health
#> 139 2019 Data Science
#> 140 2019 Data Science
#> 141 2019 Public Health
#> 142 2019 Public Health
#> 143 2019 Data Science
#> 144 2019 Public Health
#> 145 2019 Public Health Administration
#> 146 2019 Public Health Systems Research
#> 147 2019 Public Health
#> 148 2019 Data Science
#> 149 2019 Data Science
#> 150 2019 Data Science
#> 151 2019 Data Science
#> 152 2019 Data Science
#> 153 2019 Public Health
#> 154 2019 Data Science
#> 155 2019 Public Health
#> 156 2019 Data Science
#> 157 2019 Public Health
#> 158 2019 Public Health Surveillance
#> 159 2019 Public Health Surveillance
#> 160 2019 Data Science
#> 161 2019 Data Science
#> 162 2019 Data Science
#> 163 2019 Data Science
#> 164 2019 Public Health
#> 165 2019 Data Science
#> 166 2019 Public Health
#> 167 2019 Data Science
#> 168 2019 Public Health
#> 169 2019 Data Science
#> 170 2019 Data Science
#> 171 2019 Public Health
#> 172 2019 Data Science
#> 173 2019 Data Science
#> 174 2019 Data Science
#> 175 2019 Public Health
#> 176 2019 Data Science
Topic modelling is a form of unsupervised machine learning that can help us classify texts. There are two main packages in R for this, topicmodels
and stm
. In this workflow we are using an NLP package udpipe
to tokemnise and annote texts, and topicmodels
to classify and visualise documents.
To facilitate this process we have added 4 functions to the myScrapers
package. These are:
annotate_abstracts
. This parses and annotate abstracts - splits the abstracts into individual words (tokens) and adds parts of speech to each token. The function downloads the English language model for udpipe. It takse two arguments - abstract text, and abstract identifier (pmid).abstract_nounphrases
. This creates nounphrases - compound termsabstract_topics
. This does the necessary processing on the annotated data to convert it to a form that topicmodels
can run. It outputs the topic assignment for each abstract and the top terms form each topic/ The number of topics (k) is specified by the user.topic_viz
. This creates a network visualisation for a topic.Let’s illustrate how the flow works.
The first step is to parse the abstracts. Note: this can take some time
library(udpipe)
results_na <- results$abstracts %>%
filter(nchar(abstract) >0)
x <- results_na %>%
select(abstract) %>%
distinct()
nrow(x)
#> [1] 1759
id <- results_na %>%
select(DOI) %>%
distinct()
nrow(id)
#> [1] 1759
anno <- annotate_abstracts(abstract = x$abstract, pmid = id$DOI)
#> Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.4/master/inst/udpipe-ud-2.4-190531/english-ewt-ud-2.4-190531.udpipe to /Users/julianflowers/Documents/New-R-projects/phds-article/english-ewt-ud-2.4-190531.udpipe
#> Visit https://github.com/jwijffels/udpipe.models.ud.2.4 for model license details
head(anno)
#> doc_id paragraph_id sentence_id
#> 1 31883553 1 1
#> 2 31883553 1 1
#> 3 31883553 1 1
#> 4 31883553 1 1
#> 5 31883553 1 1
#> 6 31883553 1 1
#> sentence
#> 1 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 2 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 3 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 4 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 5 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 6 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> token_id token lemma upos xpos
#> 1 1 Pharmacoepidemiology Pharmacoepidemiology NOUN NN
#> 2 2 is be AUX VBZ
#> 3 3 the the DET DT
#> 4 4 study study NOUN NN
#> 5 5 of of ADP IN
#> 6 6 the the DET DT
#> feats head_token_id dep_rel
#> 1 Number=Sing 4 nsubj
#> 2 Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 cop
#> 3 Definite=Def|PronType=Art 4 det
#> 4 Number=Sing 0 root
#> 5 <NA> 7 case
#> 6 Definite=Def|PronType=Art 7 det
#> deps misc topic_id
#> 1 <NA> <NA> 16497
#> 2 <NA> <NA> 16497
#> 3 <NA> <NA> 16497
#> 4 <NA> <NA> 16497
#> 5 <NA> <NA> 16497
#> 6 <NA> <NA> 16497
This step takes the annotated data created in the previous step and creates phrases. This can enrich the topic modelling step but can be missed out.
np <- abstract_nounphrases(anno)
np %>%
filter(!is.na(term)) %>%
head(10) %>%
select(doc_id, sentence, term)
#> doc_id
#> 1 31883553
#> 2 31883553
#> 3 31883553
#> 4 31883553
#> 5 31883553
#> 6 31883553
#> 7 31883553
#> 8 31883553
#> 9 31883553
#> 10 31883553
#> sentence
#> 1 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 2 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 3 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 4 Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval.
#> 5 The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up.
#> 6 The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up.
#> 7 The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up.
#> 8 The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up.
#> 9 The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up.
#> 10 The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up.
#> term
#> 1 Pharmacoepidemiology
#> 2 study of the safety
#> 3 effectiveness of medications
#> 4 market approval
#> 5 availability
#> 6 size of healthcare utilization databases
#> 7 study of rare adverse events
#> 8 group analyses
#> 9 term follow
#> 10 up
topic <- myScrapers::abstract_topic_viz(x = np, m = topics$model, scores = topics$scores, n = 10)
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:dplyr':
#>
#> as_data_frame, groups, union
#> The following objects are masked from 'package:purrr':
#>
#> compose, simplify
#> The following object is masked from 'package:tidyr':
#>
#> crossing
#> The following object is masked from 'package:tibble':
#>
#> as_data_frame
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
#> Registered S3 methods overwritten by 'huge':
#> method from
#> plot.sim BDgraph
#> print.sim BDgraph
#>
#> Attaching package: 'qgraph'
#> The following object is masked from 'package:ggraph':
#>
#> qgraph
figures <- map(1:10, ~(abstract_topic_viz(x = np, m = topics$model, scores = topics$scores, n = .x)))