Required packages

以下が、このプログラムの実行に必要なパッケージ

require(easyPubMed)

## Loading required package: easyPubMed

require(tm)

## Loading required package: tm

## Loading required package: NLP

require(udpipe)

## Loading required package: udpipe

require(wordcloud)

## Loading required package: wordcloud

## Loading required package: RColorBrewer

require(word2vec)

## Loading required package: word2vec

require(Rtsne)

## Loading required package: Rtsne

require(plotly)

## Loading required package: plotly

## Loading required package: ggplot2

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:NLP':
## 
##     annotate

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

Data preparation

PubMedからデータを取得する

query <- "COVID 19" 
ids <- get_pubmed_ids(query)
ids$Count

## [1] "324847"

324,798件がヒットする（2022.12.26 現在）

内容を減らすために、クエリを限定的にする。

query <- "COVID 19 AND machine learning" 
ids <- get_pubmed_ids(query)
ids$Count

## [1] "4372"

4,372件がヒットする（2022.12.26 現在）

これを使って以下の解析を行う。

まず、PubMedのデータを取り出してくる。fetch_pubmed_data関数は、1度に最大4999件まで取り出せる。（それ以上取り出す場合は、開始番号を変えながら、繰り返し関数を使う必要がある）。

pmd.xml <- fetch_pubmed_data(ids, retmax = 5000)
pmd.list <- articles_to_list(pmd.xml)
length(pmd.list)

## [1] 4372

タイトルと要旨（アブストラクト）からなるデータフレームを作成する。また、欠測がある（要旨が無い）論文を除いておく。

titl <- rep(NA, length(pmd.list))
abst <- rep(NA, length(pmd.list))
for(i in 1:length(pmd.list)) {
  df <- article_to_df(pmd.list[[i]], max_chars = -1, getAuthors = F)
  titl[i] <- df$title
  abst[i] <- df$abstract
}
df <- data.frame(titl, abst)
df <- na.omit(df)
dim(df)

## [1] 4275    2

Analysis of the titles

タイトルに現れる単語の出現頻度を調べる。

オリジナルのタイトルのデータを確認。

doc <- df$titl
doc[1:2]

## [1] "New advances in prediction and surveillance of preeclampsia: role of machine learning approaches and remote monitoring."
## [2] "Comparing Short-Term Univariate and Multivariate Time-Series Forecasting Models in Infectious Disease Outbreak."

全て小文字に変換し、数字や、カッコや句読点を取り除く。

その前に、ハイフンをスペースに変換しておく。

doc.tmp <- gsub("-", " ", doc)
doc[1:2]

## [1] "New advances in prediction and surveillance of preeclampsia: role of machine learning approaches and remote monitoring."
## [2] "Comparing Short-Term Univariate and Multivariate Time-Series Forecasting Models in Infectious Disease Outbreak."

doc.tmp[1:2]

## [1] "New advances in prediction and surveillance of preeclampsia: role of machine learning approaches and remote monitoring."
## [2] "Comparing Short Term Univariate and Multivariate Time Series Forecasting Models in Infectious Disease Outbreak."

小文字に変換し、数字や、カッコや句読点を取り除く。

doc.cleaned <- stripWhitespace(
  removePunctuation(
    removeNumbers(tolower(doc))))
doc.cleaned[1:2]

## [1] "new advances in prediction and surveillance of preeclampsia role of machine learning approaches and remote monitoring"
## [2] "comparing shortterm univariate and multivariate timeseries forecasting models in infectious disease outbreak"

頻度をカウントする（論文ごと）。

dtf <- document_term_frequencies(doc.cleaned)
head(dtf, 20)

##     doc_id         term freq
##  1:   doc1          new    1
##  2:   doc1     advances    1
##  3:   doc1           in    1
##  4:   doc1   prediction    1
##  5:   doc1          and    2
##  6:   doc1 surveillance    1
##  7:   doc1           of    2
##  8:   doc1 preeclampsia    1
##  9:   doc1         role    1
## 10:   doc1      machine    1
## 11:   doc1     learning    1
## 12:   doc1   approaches    1
## 13:   doc1       remote    1
## 14:   doc1   monitoring    1
## 15:   doc2    comparing    1
## 16:   doc2    shortterm    1
## 17:   doc2   univariate    1
## 18:   doc2          and    1
## 19:   doc2 multivariate    1
## 20:   doc2   timeseries    1

全ての論文に対して頻度を足し合わせる。上位100位の単語を示す。

res <- tapply(dtf$freq, dtf$term, sum)
sort(res, decreasing = T)[1:100]

##             of          covid            and       learning            the 
##           2959           2830           2268           1645           1575 
##            for             in              a        machine          using 
##           1567           1462           1425           1090            949 
##             to             on           deep           with       patients 
##            732            694            639            609            451 
##     prediction           from      detection        sarscov       pandemic 
##            445            420            398            396            387 
##       analysis          study          model          based           data 
##            382            361            360            338            328 
##             an       approach          chest         images         during 
##            314            311            295            282            272 
##     artificial      diagnosis   intelligence        disease             ct 
##            256            253            227            216            215 
##         health classification       clinical           xray             by 
##            208            206            205            204            201 
##         models     predicting    coronavirus          novel         review 
##            197            188            183            172            171 
##           risk      infection      mortality       severity         system 
##            168            163            163            156            151 
##    development        network      pneumonia         neural  learningbased 
##            148            147            144            142            128 
##        predict         social          early    forecasting       features 
##            128            123            109            105            102 
##      framework    application         impact       networks           drug 
##            102            100             99             98             96 
##           lung        factors     validation     techniques      screening 
##             96             95             95             94             93 
##           care        methods     approaches           case       ensemble 
##             91             91             88             85             85 
##       modeling     predictive      algorithm identification          image 
##             85             85             84             84             84 
##      automated         method        vaccine        medical       transfer 
##             83             83             83             81             81 
##         public          among        against          cases        through 
##             80             79             78             77             77 
##          human        feature            new        twitter     algorithms 
##             75             73             73             73             72 
##        digital          media      potential    respiratory             as 
##             71             71             71             70             69

よくある（あまり意味をもたない）単語を取り除く。そのための単語のリストを準備する。

stp <- stopwords("en")
head(stp, 20)

##  [1] "i"          "me"         "my"         "myself"     "we"        
##  [6] "our"        "ours"       "ourselves"  "you"        "your"      
## [11] "yours"      "yourself"   "yourselves" "he"         "him"       
## [16] "his"        "himself"    "she"        "her"        "hers"

上のリストのいずれかに一致する場合はデータから除く。

selector <- !(dtf$term %in% stopwords())
dtf.sel <- dtf[selector, ]

再度、数え上げをする。上位100位を示す。

word.count <- tapply(dtf.sel$freq, dtf.sel$term, sum)
sort(word.count, decreasing = T)[1:100]

##          covid       learning        machine          using           deep 
##           2830           1645           1090            949            639 
##       patients     prediction      detection        sarscov       pandemic 
##            451            445            398            396            387 
##       analysis          study          model          based           data 
##            382            361            360            338            328 
##       approach          chest         images     artificial      diagnosis 
##            311            295            282            256            253 
##   intelligence        disease             ct         health classification 
##            227            216            215            208            206 
##       clinical           xray         models     predicting    coronavirus 
##            205            204            197            188            183 
##          novel         review           risk      infection      mortality 
##            172            171            168            163            163 
##       severity         system    development        network      pneumonia 
##            156            151            148            147            144 
##         neural  learningbased        predict         social          early 
##            142            128            128            123            109 
##    forecasting       features      framework    application         impact 
##            105            102            102            100             99 
##       networks           drug           lung        factors     validation 
##             98             96             96             95             95 
##     techniques      screening           care        methods     approaches 
##             94             93             91             91             88 
##           case       ensemble       modeling     predictive      algorithm 
##             85             85             85             85             84 
## identification          image      automated         method        vaccine 
##             84             84             83             83             83 
##        medical       transfer         public          among          cases 
##             81             81             80             79             77 
##          human        feature            new        twitter     algorithms 
##             75             73             73             73             72 
##        digital          media      potential    respiratory     monitoring 
##             71             71             71             70             67 
##         cohort         hybrid      automatic     evaluation  convolutional 
##             66             66             65             65             64 
##     assessment          blood     healthcare   hospitalized           role 
##             62             62             62             60             60 
##      sentiment       outcomes     systematic    identifying        imaging 
##             60             59             58             57             57

ワードクラウドを用いて表示する。頻出単語上位100を表示する。

word.top100 <- sort(word.count, decreasing = T)[1:100]
wordcloud(names(word.top100), freq = word.top100, color = brewer.pal(8, "Dark2"))

Analysis with word2vec

Word2vecを使った解析を行う。なお、Word2vecについての説明は、原著https://arxiv.org/pdf/1301.3781.pdf、また、解説https://arxiv.org/pdf/1411.2738.pdfの論文を参考にするとよい（後者が分かりやすい）。なお、ブログなどの記事も多い。例えば、https://israelg99.github.io/2017-03-23-Word2Vec-Explained/、https://towardsdatascience.com/word2vec-explained-49c52b4ccb71。

まずは、データを準備する。

x <- txt_clean_word2vec(df$abst)

次に、word2vec関数で、単語間の関係を学習する。ここでは、skip-gramアルゴリズムを用いる。

model <- word2vec(x, type = "skip-gram", dim = 30, window = 5, iter = 10)

結果を表示。単語がベクトル空間内の点として表される。

head(as.matrix(model), 10)

##                    [,1]       [,2]        [,3]       [,4]         [,5]
## access        1.8834980 -2.3609595  1.13500381  1.4927373  0.542817116
## stability    -0.8966581 -1.0324870 -0.31039262 -0.5030487  0.038705207
## eliminating   1.5642663 -0.6669895  0.34348664  0.3617151  1.166107416
## ligands       1.6098195 -0.4321874  0.60072803  0.1811531  0.224953234
## represent     0.3369185 -1.8443726  0.04610289  0.6622514 -1.278716207
## spanning      0.7340813  0.8242792  1.36835647  0.6356338 -0.723508716
## outpatient    1.7866665 -0.2589226  1.98418868  1.2449813  0.883127272
## degeneration  0.5019498  0.3133734  1.40740240  0.9938579 -0.002220026
## unparalleled  0.8866035  0.6266905  0.93503529  0.8410850  0.064271405
## insights      1.3087177 -1.1826060 -0.55142534  0.5043691 -0.693867624
##                     [,6]        [,7]        [,8]        [,9]       [,10]
## access       -0.02805615 -1.55603015  0.38960561  0.41539526 -0.45791730
## stability    -1.29562759 -0.26410103 -0.48054999  1.57208276 -0.20944515
## eliminating  -0.02478027 -1.10254073  0.07579076  1.66908717  0.52693337
## ligands      -0.64256972  0.03663487  0.06771453  0.46551913 -0.86478013
## represent     0.38264179 -1.02044368  1.46145773  1.09471750 -2.01458192
## spanning     -0.42140484  1.35439456  1.62295806  0.04233977 -0.16239795
## outpatient    0.89739275  1.31969237  0.08918275  1.53049541 -0.04111318
## degeneration  1.73076141 -0.23754048 -1.40735209 -0.02257124 -2.45811152
## unparalleled -0.59475613 -1.25391376  1.05681419  0.48143473 -1.14191377
## insights     -0.42294213 -0.44819540  0.36971584 -0.29888809 -1.18783474
##                   [,11]      [,12]       [,13]      [,14]       [,15]
## access       -1.5041744 -1.0554111 -1.08506060 -0.2090475 -0.42590329
## stability    -1.1446419 -0.6541144 -0.46916291  1.6136422  1.30108607
## eliminating  -0.2792227 -0.1273677  1.76837897  2.1591556 -0.17860299
## ligands      -0.2024247  1.4765788 -1.02697039  1.8706492  1.76170230
## represent     0.1696163  0.2273275  0.10218593  0.9292352 -0.02506509
## spanning      0.6705174 -0.8349549 -0.13178769 -0.1140475  1.80735731
## outpatient   -0.7826992 -1.1549877 -0.22847484 -0.9016120 -0.75434852
## degeneration -1.1211474 -0.4919341 -0.39300355  0.2923086 -1.01190710
## unparalleled -1.6372910 -1.6781262 -0.09766967 -0.6491150  0.11443575
## insights      0.3033503 -0.6893293  1.39802897 -0.1052159  0.05170381
##                   [,16]       [,17]      [,18]      [,19]       [,20]     [,21]
## access       -0.4937030 -0.35358435  0.1562183 0.34102646  1.06768548 1.4833052
## stability     0.8501904  1.03535473 -1.5515784 2.87455153  0.31837282 0.3806680
## eliminating   0.1075667  0.27414146 -1.6702029 0.93148786  0.70044762 1.5865083
## ligands       0.8309996  0.36098820 -0.8173749 2.23008323  0.69616765 0.8623462
## represent    -1.2056096 -0.62799406 -1.7664572 1.03400826 -0.41982606 0.5974448
## spanning     -0.8194700  0.40543330  1.1067102 0.85468942  0.12250974 0.9896046
## outpatient    1.1913366 -0.01630457  0.7075865 0.02031492  0.69213843 0.8853406
## degeneration -1.3654140 -0.10030320  0.4742510 0.78608686 -0.22139725 1.4619242
## unparalleled  2.0729208 -0.33632502 -0.7768776 0.71200866  0.05488064 1.3273437
## insights      0.5510238  0.02700333 -1.0044469 1.57278991  0.04942456 2.7856309
##                    [,22]      [,23]      [,24]      [,25]       [,26]
## access        1.05134892 -0.3543068  0.1574571 -0.7036515  0.02190759
## stability    -1.50188696 -0.6507681 -0.5047027  0.2460931  0.10925783
## eliminating  -1.23322070 -0.7997144  0.1323705  0.7459010  0.04732146
## ligands      -0.66868216 -0.4972401 -2.0995812  0.5459074  0.39897972
## represent     0.06937215 -1.2207992 -0.3988610  0.8187262  0.03318119
## spanning     -0.95820498  1.1151983 -0.2620923 -1.9664277  0.46002764
## outpatient   -0.19677567  1.1265121  1.9988035 -1.3363581 -0.02731497
## degeneration -0.36374572 -1.3789468  1.4970579  0.3939926 -0.88063055
## unparalleled  1.77237880 -0.4483878  1.0660981 -0.4135475 -0.58891791
## insights      0.68586522 -1.1481521  1.1800277  1.0301284 -1.45251751
##                    [,27]       [,28]       [,29]       [,30]
## access        1.18207002  0.90332258 -1.23154032  0.03696854
## stability     0.35127428  0.28652859 -0.09005820  1.04790318
## eliminating   0.03140764 -0.75984484  0.01902853  1.79779601
## ligands      -0.44234350  0.03135437  0.95867807  1.07195950
## represent     1.52137721  0.11871677 -1.88761401  0.29222265
## spanning      2.20786238 -0.84908479 -0.37805405  1.01536453
## outpatient    0.53431118 -1.04972351 -0.26777437 -0.67968041
## degeneration  1.05353963 -0.52781159 -0.87775630  0.75818646
## unparalleled  0.13198513  1.00854647 -1.83406508  0.18777904
## insights      1.65858340  0.02616262  0.43860298 -0.67190403

“mask”という単語と類似度が高い単語を30個リストアップする。

nn <- predict(model, c("mask"), type = "nearest", top_n = 30)
nn

## $mask
##    term1         term2 similarity rank
## 1   mask       wearing  0.9168410    1
## 2   mask         masks  0.8957075    2
## 3   mask      improper  0.8813931    3
## 4   mask          face  0.8736871    4
## 5   mask          ddos  0.8611960    5
## 6   mask      mandates  0.8553889    6
## 7   mask    violations  0.8536556    7
## 8   mask       mandate  0.8481603    8
## 9   mask    distancing  0.8362848    9
## 10  mask          wear  0.8217394   10
## 11  mask        masked  0.8132742   11
## 12  mask    passengers  0.8108311   12
## 13  mask      facemask  0.8093920   13
## 14  mask        stores  0.8080394   14
## 15  mask  recommending  0.8063456   15
## 16  mask     universal  0.8052245   16
## 17  mask         hands  0.8020772   17
## 18  mask         usage  0.8015509   18
## 19  mask       vehicle  0.7990443   19
## 20  mask         norms  0.7957740   20
## 21  mask      distance  0.7943754   21
## 22  mask effectiveness  0.7909495   22
## 23  mask       closure  0.7897944   23
## 24  mask          slow  0.7890968   24
## 25  mask       perfect  0.7890829   25
## 26  mask   overfitting  0.7890803   26
## 27  mask         touch  0.7884932   27
## 28  mask   respirators  0.7874386   28
## 29  mask          fake  0.7872146   29
## 30  mask          bans  0.7866435   30

“omicron”という単語と類似度が高い単語を30個リストアップする。

nn <- predict(model, c("omicron"), type = "nearest", top_n = 30)
nn

## $omicron
##      term1     term2 similarity rank
## 1  omicron   variant  0.9722302    1
## 2  omicron  variants  0.9365270    2
## 3  omicron     delta  0.9097216    3
## 4  omicron   strains  0.8974876    4
## 5  omicron emergence  0.8811155    5
## 6  omicron     alpha  0.8700840    6
## 7  omicron       voc  0.8692313    7
## 8  omicron ancestral  0.8593056    8
## 9  omicron   mutated  0.8506261    9
## 10 omicron      zika  0.8488913   10
## 11 omicron  epidemic  0.8485928   11
## 12 omicron causative  0.8456267   12
## 13 omicron  zoonotic  0.8451722   13
## 14 omicron outbreaks  0.8441400   14
## 15 omicron  emerging  0.8428910   15
## 16 omicron    mutant  0.8427690   16
## 17 omicron  outburst  0.8424161   17
## 18 omicron      beta  0.8403398   18
## 19 omicron  lineages  0.8382600   19
## 20 omicron   evolves  0.8371908   20
## 21 omicron     virus  0.8370464   21
## 22 omicron  hotspots  0.8367106   22
## 23 omicron     theme  0.8358337   23
## 24 omicron      cov2  0.8354232   24
## 25 omicron    emerge  0.8332907   25
## 26 omicron       617  0.8317066   26
## 27 omicron  emergent  0.8308727   27
## 28 omicron   evolved  0.8293188   28
## 29 omicron epidemics  0.8276324   29
## 30 omicron mutations  0.8270133   30

“model”という単語と類似度が高い単語を30個リストアップする。

nn <- predict(model, c("model"), type = "nearest", top_n = 30)
nn

## $model
##    term1               term2 similarity rank
## 1  model           algorithm  0.9146559    1
## 2  model            stacking  0.9143786    2
## 3  model              models  0.9107108    3
## 4  model                 oga  0.9090107    4
## 5  model       configuration  0.9041815    5
## 6  model            ensemble  0.9033030    6
## 7  model                 gpr  0.9030290    7
## 8  model          classifier  0.9019660    8
## 9  model               avedl  0.8963149    9
## 10 model                dcnn  0.8960069   10
## 11 model                gbts  0.8959891   11
## 12 model            catboost  0.8930810   12
## 13 model                 dws  0.8903774   13
## 14 model             xgboost  0.8893422   14
## 15 model               tunes  0.8878289   15
## 16 model              cgenet  0.8865037   16
## 17 model lightefficientnetv2  0.8841574   17
## 18 model             stacked  0.8827171   18
## 19 model                best  0.8810557   19
## 20 model             learner  0.8806218   20
## 21 model              hybrid  0.8787336   21
## 22 model              cubist  0.8771569   22
## 23 model                lgbm  0.8756057   23
## 24 model                  df  0.8753271   24
## 25 model                 bls  0.8746838   25
## 26 model                 udl  0.8738938   26
## 27 model     hyperparameters  0.8729782   27
## 28 model              arimax  0.8706847   28
## 29 model          prediction  0.8693837   29
## 30 model                serp  0.8678178   30

次に、タイトルでよく用いられていた（頻出単語上位100個）について、ベクトル空間上での位置関係をt-SNE法によって視覚化する。

selector <- names(word.top100) %in% rownames(as.matrix(model))
dm <- as.matrix(model)[names(word.top100)[selector],]
word <- rownames(dm)
tsne.dm <- Rtsne(dm)

視覚化

df <- data.frame(tsne.dm$Y, word)
plot_ly(df, x = ~X1, y = ~ X2, type = "scatter", mode = "text", text = word)

次に、得られた単語の関係をもとに、文書を30次元の空間に埋め込む。

x <- data.frame(doc_id = titl, text = abst)
emb <- doc2vec(model, x, type = "embedding")

omicronに関する説明（WHOのホームページより）をクエリにして、類似度の文書を得る。 https://www.who.int/news/item/28-11-2021-update-on-omicron

q <- txt_clean_word2vec("On 26 November 2021, WHO designated the variant B.1.1.529 a variant of concern, named Omicron, on the advice of WHO’s Technical Advisory Group on Virus Evolution (TAG-VE).  This decision was based on the evidence presented to the TAG-VE that Omicron has several mutations that may have an impact on how it behaves, for example, on how easily it spreads or the severity of illness it causes. Here is a summary of what is currently known. ")
# from https://www.who.int/news/item/28-11-2021-update-on-omicron
newdoc <- doc2vec(model, q)
sim <- word2vec_similarity(emb, newdoc)
names(sim) <- rownames(emb)
sort(sim, decreasing = T)[1:20]

##                                                                                 The COVID-19 pandemic: prediction study based on machine learning models. 
##                                                                                                                                                 0.9925448 
##                                         Pandemic coronavirus disease (Covid-19): World effects analysis and prediction using machine-learning techniques. 
##                                                                                                                                                 0.9881864 
##                                                                    Chemo-Preventive Effect of Vegetables and Fruits Consumption on the COVID-19 Pandemic. 
##                                                                                                                                                 0.9880966 
##                                                                   Role of Imaging and AI in the Evaluation of COVID-19 Infection: A Comprehensive Survey. 
##                                                                                                                                                 0.9876175 
##                                          Advanced Deep Learning Algorithms for Infectious Disease Modeling Using Clinical Data: A Case Study on COVID-19. 
##                                                                                                                                                 0.9872560 
##                                                                 A comprehensive review on variants of SARS-CoVs-2: Challenges, solutions and open issues. 
##                                                                                                                                                 0.9869946 
##                              Regressive Class Modelling for Predicting Trajectories of COVID-19 Fatalities Using Statistical and Machine Learning Models. 
##                                                                                                                                                 0.9868742 
##                                   COVID-19 in Bangladesh: A Deeper Outlook into The Forecast with Prediction of Upcoming Per Day Cases Using Time Series. 
##                                                                                                                                                 0.9867825 
##                                                                Real-time measurement of the uncertain epidemiological appearances of COVID-19 infections. 
##                                                                                                                                                 0.9867129 
##                                          Prediction and forecasting of worldwide corona virus (COVID-19) outbreak using time series and machine learning. 
##                                                                                                                                                 0.9866893 
##                                                                                         Covidex: An ultrafast and accurate tool for SARS-CoV-2 subtyping. 
##                                                                                                                                                 0.9864149 
##                                                                      Coronaviruses and people with intellectual disability: an exploratory data analysis. 
##                                                                                                                                                 0.9863524 
##                                          Variant-driven early warning via unsupervised machine learning analysis of spike protein mutations for COVID-19. 
##                                                                                                                                                 0.9862817 
##                                                                 Diagnosis of COVID-19 and non-COVID-19 patients by classifying only a single cough sound. 
##                                                                                                                                                 0.9862552 
## A Hybrid Protocol for Identifying Comorbidity-Based Potential Drugs for COVID-19 Using Biomedical Literature Mining, Network Analysis, and Deep Learning. 
##                                                                                                                                                 0.9859721 
##                   Prediction of COVID-19 Pandemic in Bangladesh: Dual Application of Susceptible-Infective-Recovered (SIR) and Machine Learning Approach. 
##                                                                                                                                                 0.9857951 
##                                                                                  Analysis on novel coronavirus (COVID-19) using machine learning methods. 
##                                                                                                                                                 0.9857901 
##         MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions. 
##                                                                                                                                                 0.9857859 
##                                                Origin of novel coronavirus causing COVID-19: A computational biology study using artificial intelligence. 
##                                                                                                                                                 0.9855705 
##                                                                           Time series forecasting of COVID-19 transmission in Canada using LSTM networks. 
##                                                                                                                                                 0.9855676

COVID-19に関するニューラルネットワークの論文をクエリにして類似の文書を得る。 https://pubmed.ncbi.nlm.nih.gov/34745319/

my.abst <- txt_clean_word2vec("Recently, people around the world are being vulnerable to the pandemic effect of 
the novel Corona Virus. It is very difficult to detect the virus infected chest 
X-ray (CXR) image during early stages due to constant gene mutation of the 
virus. It is also strenuous to differentiate between the usual pneumonia from 
the COVID-19 positive case as both show similar symptoms. This paper proposes a 
modified residual network based enhancement (ENResNet) scheme for the visual 
clarification of COVID-19 pneumonia impairment from CXR images and 
classification of COVID-19 under deep learning framework. Firstly, the residual 
image has been generated using residual convolutional neural network through 
batch normalization corresponding to each image. Secondly, a module has been 
constructed through normalized map using patches and residual images as input. 
The output consisting of residual images and patches of each module are fed into 
the next module and this goes on for consecutive eight modules. A feature map is 
generated from each module and the final enhanced CXR is produced via 
up-sampling process. Further, we have designed a simple CNN model for automatic 
detection of COVID-19 from CXR images in the light of 'multi-term loss' function 
and 'softmax' classifier in optimal way. The proposed model exhibits better 
result in the diagnosis of binary classification (COVID vs. Normal) and 
multi-class classification (COVID vs. Pneumonia vs. Normal) in this study. The 
suggested ENResNet achieves a classification accuracy 99.7% and 98.4% for binary 
classification and multi-class detection respectively in comparison with 
state-of-the-art methods.")
# Ghosh and Ghosh (2022) ENResNet: A novel residual neural network for chest X-ray enhancement based COVID-19 detection. Biomed Signal Process doi: 10.1016/j.bspc.2021.103286.
# PMID: 34745319
newdoc <- doc2vec(model, my.abst)
sim <- word2vec_similarity(emb, newdoc)
names(sim) <- rownames(emb)
sort(sim, decreasing = T)[1:20]

##                                                                         COVID-19 Detection Based on Image Regrouping and Resnet-SVM Using Chest X-Ray Images. 
##                                                                                                                                                     0.9947877 
##                                                                COVID-19 deep classification network based on convolution and deconvolution local enhancement. 
##                                                                                                                                                     0.9944000 
##                             An automated and fast system to identify COVID-19 from X-ray radiograph of the chest using image processing and machine learning. 
##                                                                                                                                                     0.9930635 
##                                           Using handpicked features in conjunction with ResNet-50 for improved detection of COVID-19 from chest X-ray images. 
##                                                                                                                                                     0.9930370 
##                                       CAD systems for COVID-19 diagnosis and disease stage classification by segmentation of infected regions from CT images. 
##                                                                                                                                                     0.9925620 
##                                          Detection and classification of lung diseases for pneumonia and Covid-19 using machine and deep learning techniques. 
##                                                                                                                                                     0.9924851 
##                                                                         Segmenting lung lesions of COVID-19 from CT images via pyramid pooling improved Unet. 
##                                                                                                                                                     0.9921300 
##                                                                               Multi-task multi-modality SVM for early COVID-19 Diagnosis using chest CT data. 
##                                                                                                                                                     0.9917714 
## Fast and Accurate Detection of COVID-19 Along With 14 Other Chest Pathologies Using a Multi-Level Classification: Algorithm Development and Validation Study. 
##                                                                                                                                                     0.9917466 
##                                                         Classification of COVID-19 chest X-Ray and CT images using a type of dynamic CNN modification method. 
##                                                                                                                                                     0.9910857 
##                                Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. 
##                                                                                                                                                     0.9909081 
##       Fully automatic pipeline of convolutional neural networks and capsule networks to distinguish COVID-19 from community-acquired pneumonia via CT images. 
##                                                                                                                                                     0.9908202 
##                                                            A deep learning based approach for automatic detection of COVID-19 cases using chest X-ray images. 
##                                                                                                                                                     0.9908165 
##                               [Research on coronavirus disease 2019 (COVID-19) detection method based on depthwise separable DenseNet in chest X-ray images]. 
##                                                                                                                                                     0.9907997 
##                                                                Multi-branch fusion auxiliary learning for the detection of pneumonia from chest X-ray images. 
##                                                                                                                                                     0.9906843 
##                                                                                    FAM: focal attention module for lesion segmentation of COVID-19 CT images. 
##                                                                                                                                                     0.9906176 
##                                                                                        FBSED based automatic diagnosis of COVID-19 using X-ray and CT images. 
##                                                                                                                                                     0.9905279 
##                                                         An optimized KELM approach for the diagnosis of <i>COVID-19</i> from 2D-SSA reconstructed CXR Images. 
##                                                                                                                                                     0.9904189 
##                                                                            COVID-19 Detection from Chest X-ray Images Using Feature Fusion and Deep Learning. 
##                                                                                                                                                     0.9903693 
##                                                                                          COVID Detection From Chest X-Ray Images Using Multi-Scale Attention. 
##                                                                                                                                                     0.9903296

バイオメトリックス第11回　テキストマイニング

岩田洋佳 hiroiwata@g.ecc.u-tokyo.ac.jp

2022-12-26

Required packages

Data preparation

Analysis of the titles

Analysis with word2vec

バイオメトリックス第11回 テキストマイニング

岩田洋佳 hiroiwata@g.ecc.u-tokyo.ac.jp

2022-12-26

Required packages

Data preparation

Analysis of the titles

Analysis with word2vec

バイオメトリックス第11回　テキストマイニング