class: center, middle, inverse, title-slide # 探索COIVD-19資料 ## R 語言於流行病學資料之視覺化運用 ### 蘇睿寧 ### 2021-12-26 --- ## The Diagram of nCov2019 architecture <img src="nCov2019.png" width="68%" style="display: block; margin: auto;" /> --- ## Overviews ###- Data collection 數據收集 ###- Data query 資料查詢 ###- Data operation with Geographic maps 資料操作 ###- Interactive dashboard 互動式儀表板 --- ## {nCov2019}安裝package ###有時候資料下載會比較慢 ```r #remotes::install_github("yulab-smu/nCov2019", dependencies = TRUE) library("nCov2019") ``` ## Data collection ###-WorldoMeters 實時統計數據的網站 ###-Center for Systems Science and Engineering(CSSE) ###-website of raps.org 疫苗研發資料 --- <style type="text/css"> /* custom.css */ .left-code { color: #777; width: 30%; height: 150%; float: left; } .pull-left { float: left; width: 47%; } .pull-right { float: right; width: 47%; } </style> .pull-left[ ```r res <- query() ``` ``` ## Querying the latest data... ``` ``` ## last update: 2021-12-26 ``` ``` ## Querying the global data... ``` ``` ## Gloabl total 279848485 cases; and 5413544 deaths ## Gloabl total affect country or areas: 224 ## Gloabl total recovered cases: 59214 ## last update: 2021-12-26 ``` ``` ## Querying the historical data... ``` ``` ## Querying the vaccine data... ``` ``` ## Total Candidates Programs : 51 ``` ``` ## Querying the therapeutics data... ``` ``` ## Total Candidates Programs : 84 ``` ``` ## Query finish, each time you can launch query() to reflash the data ``` ] .pull-right[ ## Data query 啟動資料分類 ###- **the latest data** ###- **global data** ###- **historical data** ###- **vaccine/therapeutics data** ] ```r names(res) ``` ``` ## [1] "latest" "global" "historical" "vaccine" "therapeutics" ``` --- ## Data operation: Global data ```r x<- res$latest DT::datatable(x["Global"], rownames=FALSE, options=list(scrollY = 300, pageLength=10), fillContainer =TRUE) ```
```r #return all global countries ``` --- ## Data query: Taiwan as example ```r DT::datatable(x[c("Taiwan")], options=list(scrollY = 80, pageLength=10), fillContainer =FALSE) ```
```r # return only for Taiwan ``` <img src="nCov2019TTW.png" width="859" style="display: block; margin: auto;" /> ### 資料一致性高 --- ## Data query: Historical data of Taiwan .pull-left[ ```r Z<- res$historical print(Z) ``` ``` ## last update: 2021-12-25 ``` ```r tail(Z[c("Taiwan")]) ``` ``` ## country date cases deaths recovered ## 136981 Taiwan 2021-12-20 16816 850 0 ## 137177 Taiwan 2021-12-21 16826 850 0 ## 137373 Taiwan 2021-12-22 16840 850 0 ## 137569 Taiwan 2021-12-23 16853 850 0 ## 137765 Taiwan 2021-12-24 16873 850 0 ## 137961 Taiwan 2021-12-25 16891 850 0 ``` ] .pull-right[ <img src="historical.png" width="599" /> ] ### **雖有一點時間差,準確性仍高** --- ## Data query: Vaccine data ```r X <-res$ vaccine summary(X) ``` ``` ## phase candidates ## 1 Phase 3 10 ## 2 Phase 2/3 3 ## 3 Phase 2 2 ## 4 Phase 1/2 9 ## 5 Phase 1 13 ## 6 Pre-clinical 14 ``` --- ## Data query: Vaccine data(但找不到高端) ```r DT::datatable(X["all"], rownames=FALSE, options=list(scrollY = 300, pageLength=10), fillContainer =TRUE) ```
--- ## Data query: Vaccine data試驗細節 ```r X[ID="id1"] ``` <img src="vaccine.png" width="1081" style="display: block; margin: auto;" /> --- ## Data operation: Visualization (Confirmed Cases) .pull-left[ ```r plot( x, region="Global", continuous_scale = F, palette = "Reds", date = NULL, from = NULL, to = NULL, title = "COVID-19", type = "cases") ``` ] .pull-right[ <img src="global map.png" width="839" /> ] --- ## Data operation: Visualization (Recovered Cases) .pull-left[ ```r plot( x, region="Global", continuous_scale = F, palette = "Green", date = NULL, from = NULL, to = NULL, title = "COVID-19", type = "recovered" ) ``` ] .pull-left[ <img src="global map recov.png" width="839" /> ] --- ## Data operation: ggplot ###折線圖比較資料的趨勢 .pull-left[ ```r library(ggplot2) library(dplyr) X <- res$historical tmp <- X["global"] %>% group_by(country) %>% arrange(country,date) %>% mutate(diff = cases - lag(cases, default = first(cases))) %>% filter(country %in% c("Taiwan","Japan")) ggplot(tmp,aes(date, log(diff+1), color=country)) + geom_line() + labs(y="Log2(daily increase cases)") + theme(axis.text = element_text(angle = 15, hjust = 1)) + scale_x_date(date_labels = "%Y-%m-%d") + theme_minimal() ``` ] .pull-left[ <img src="confirmed TW.png" width="1041" /> ] --- ## Data operation: Animations plot (動畫圖示) ### 以顏色動態呈現個案確診個數在各地區的變化 .pull-left[ ```r library(nCov2019) res = query() from = "2021-11-01" to = "2021-11-30" y = res$historical plot(y, from = from, to=to) ``` ] .pull-right[ <img src="nCov2019.gif" width="200%" /> ] --- ## Data operation: Other plots(視覺累積圖) .pull-left[ ```r library(ggplot2) x <- res$historical d = x['Taiwan'] d = d[order(d$cases), ] ggplot(d, aes(date, cases)) + geom_col(fill = 'firebrick') + theme_minimal(base_size = 14) + xlab(NULL) + ylab(NULL) + scale_x_date(date_labels = "%Y/%m/%d") + labs(caption = paste("accessed date:", max(d$date))) ``` ] .pull-right[ <img src="confirmed TW2.png" width="1051" /> * **可看出來5-6月間台灣一波疫情大爆發** ] --- ## Data operation: Other plots(個案增加趨勢圖) .pull-left[ ```r library("dplyr") library("ggrepel") x <- res$latest y <- res$historical country_list = x["global"]$country[1:3] y[country_list] %>% subset( date > as.Date("2021-07-01") ) %>% group_by(country) %>% arrange(country,date) %>% mutate(increase = cases - lag(cases, default = first(cases))) -> df ggplot(df, aes(x=date, y=increase, color=country ))+ geom_smooth() + geom_label_repel(aes(label = paste(country,increase)), data = df[df$date == max(df$date), ], hjust = 1) + labs(x=NULL,y=NULL, title=paste("visualize the cumulative summary data of Taiwan"))+ theme_bw() + theme(legend.position = 'none') ``` ] .pull-right[ <img src="casestrajectory.png" width="840" /> * **Omicron自11月最早在南非被發現和確定後,個案數有增加的趨勢** ] --- ##Data operation: Taiwan 確診、復原及死亡數 .pull-left[ ```r library('tidyr') library('ggrepel') library('ggplot2') y <- res$historical country = "Taiwan" y[country] -> d d <- gather(d, curve, count, -date, -country) ggplot(d, aes(date, count, color = curve)) + geom_point() + geom_line() + labs(x=NULL,y=NULL,title=paste("Trend of cases, recovered and deaths in", country)) + scale_color_manual(values=c("#f39c12", "#dd4b39", "#00a65a")) + theme_bw() + geom_label_repel(aes(label = paste(curve,count)), data = d[d$date == max(d$date), ], hjust = 1) + theme(legend.position = "none",axis.text = element_text(angle = 15, hjust = 1)) + scale_x_date(date_labels = "%Y-%m-%d") ``` ] .pull-right[ <img src="trend TW.png" width="839" /> ] ```r require(dplyr) ``` ``` ## 載入需要的套件:dplyr ``` ``` ## ## 載入套件:'dplyr' ``` ``` ## 下列物件被遮斷自 'package:stats': ## ## filter, lag ``` ``` ## 下列物件被遮斷自 'package:base': ## ## intersect, setdiff, setequal, union ``` ```r require(ggplot2) ``` ``` ## 載入需要的套件:ggplot2 ``` ```r require(shadowtext) ``` ``` ## 載入需要的套件:shadowtext ``` --- ## Interactive dashboard(shiny APP) ```r dashboard() ``` <img src="dashboard.png" width="1249" /> --- ## Conclusion of R .pull-left[ ###- 資料串接可增加運用性 ###- 視覺性的圖像有助於理解流性病學的趨勢 ###- 疾管局亦有類似資料呈現方式 ] .pull-right[ <img src="cdc.png" width="571" /> ] --- ## Conclusion of myself .pull-left[ ###- R可以辦到的事很神奇 ###- R可以辦到的事情我大多辦不到 ###- RRRRRRRRRR... ] .pull-right[ <img src="happyNY.png" width="685" /> ]