Page 1

Chart A

Bayesian inference

From this point onwards we switch to Bayesian approach. The rstanarm package provides stan_glm function which accepts same arguments as glm, but makes full Bayesian inference using Stan (mc-stan.org). By default a weakly informative Gaussian prior is used for weights.

Summary


Model Info:
 function:     stan_glm
 family:       poisson [log]
 formula:      y ~ x1 + x2 + x3 + x4
 algorithm:    sampling
 sample:       4000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 200
 predictors:   5

Estimates:
              mean   sd    10%   50%   90%
(Intercept)   2.5    6.1  -5.4   2.6  10.2
x1           -2.0    6.1  -9.8  -2.1   6.0
x2            6.5    6.1  -1.4   6.4  14.6
x3           -8.9    6.1 -16.5  -8.9  -1.0
x4           -8.7    6.3 -16.9  -8.8  -0.6

Fit Diagnostics:
           mean   sd   10%   50%   90%
mean_PPD 4.3    0.2  4.1   4.3   4.6  

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
              mcse Rhat n_eff
(Intercept)   0.1  1.0  4112 
x1            0.1  1.0  4114 
x2            0.1  1.0  4116 
x3            0.1  1.0  4259 
x4            0.1  1.0  3946 
mean_PPD      0.0  1.0  4127 
log-posterior 0.0  1.0  1742 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Data Table

Collinearity problem as with MLEs

Improvement on predictions

In case of even more variables with some being relevant and some irrelevant, it will be difficult to analyse joint posterior to see which variables are jointly relevant. We can easily test whether any of the covariates are useful by using cross-validation to compare to a null model


Computed from 4000 by 200 log-likelihood matrix

         Estimate   SE
elpd_loo   -383.3 12.1
p_loo         5.3  0.7
looic       766.6 24.2
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.


Computed from 4000 by 200 log-likelihood matrix

         Estimate   SE
elpd_loo   -714.5 43.8
p_loo         4.9  0.8
looic      1428.9 87.5
------
Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

      elpd_diff se_diff
fitg     0.0       0.0 
fitg0 -331.2      42.7

Based on cross-validation covariates together contain significant information to improve predictions.

Variable selection

Piironen and Vehtari (2017) also show that a projection predictive approach can be used to make a model reduction, that is, choosing a smaller model with some coefficients set to 0. The projection predictive approach solves the problem how to do inference after the selection. The solution is to project the full model posterior to the restricted subspace. See more in Piironen et.al (2018).

We make the projective predictive variable selection using the previous full model. A fast leave-one-out cross-validation approach (Vehtari, Gelman and Gabry, 2017) is used to choose the model size.

[1] "Computing LOOs..."

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=======================                                               |  34%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |==============================                                        |  44%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |============================================                          |  64%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |===================================================                   |  74%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |==========================================================            |  84%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |=================================================================     |  94%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
[1] "Performing the selection using all the data.."
[1] "20% of terms selected."
[1] "50% of terms selected."
[1] "70% of terms selected."
[1] "100% of terms selected."
[1] "Done."

Recommendation for the model size to choose based on a LOO

Selection model results

[1] 2

Intercept        x2        x1 
     -6.1      15.2       6.7

Crude map of the world with capital cities:

HighChart

HighCharts dynamic chart

Page 2

Row1

Articles per Day

100

Comments per Day

200

Spam per Day

300

Row2

Contact Rate

Average Rating

Cancellations

Page4

Row1

Classes 'data.table' and 'data.frame':  11341 obs. of  8 variables:
 $ SNo            : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ObservationDate: Date, format: "2020-01-22" "2020-01-22" ...
 $ Province.State : chr  "Anhui" "Beijing" "Chongqing" "Fujian" ...
 $ Country.Region : chr  "Mainland China" "Mainland China" "Mainland China" "Mainland China" ...
 $ Last.Update    : Date, format: "2020-01-22" "2020-01-22" ...
 $ Confirmed      : num  1 14 6 1 0 26 2 1 4 1 ...
 $ Deaths         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Recovered      : num  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, ".internal.selfref")=<externalptr>

Row2

Row3 ——————

Row4 ——————

--- title: "Chart Stack (Scrolling)" Author: "Hyunsu Ju" output: flexdashboard::flex_dashboard: theme: sandstone vertical_layout: scroll social: ["facebook","twitter","menu"] source_code: embed navbar: - {title: "About", href: "https://stat.columbia.edu/~gelman/research/unpublished/loo_stan.pdf", align: left} --- ```{r setup, include=FALSE} library(flexdashboard) library(rstanarm) options(mc.cores = parallel::detectCores()) library(loo) library(tidyverse) library(GGally) library(bayesplot) theme_set(bayesplot::theme_default()) library(projpred) SEED=87 ``` Page 1 =========== ### Chart A ```{r, fig.width=15, fig.height=7} set.seed(SEED) df <- tibble( pos.tot = runif(200,min=0.8,max=1.0), urban.tot = pmin(runif(200,min=0.0,max=0.02),1.0 - pos.tot), neg.tot = (1.0 - pmin(pos.tot + urban.tot,1)), x1= pmax(pos.tot - runif(200,min=0.05,max=0.30),0), x3= pmax(neg.tot - runif(200,min=0.0,max=0.10),0), x2= pmax(pos.tot - x1 - x3/2,0), x4= pmax(1 - x1 - x2 - x3 - urban.tot,0)) # true model and 200 Poisson observations mean.y <- exp(-5.8 + 6.3*df$x1 + 15.2*df$x2) df$y <- rpois(200,mean.y) ggpairs(df,diag=list(continuous="barDiag")) ``` ### Bayesian inference From this point onwards we switch to Bayesian approach. The rstanarm package provides stan_glm function which accepts same arguments as glm, but makes full Bayesian inference using Stan (mc-stan.org). By default a weakly informative Gaussian prior is used for weights. ```{r} fitg <- stan_glm(y ~ x1 + x2 + x3 + x4, data = df, na.action = na.fail, family=poisson(), QR=TRUE, seed=SEED) ``` ### Summary ```{r} summary(fitg) ``` ### Data Table ```{r} DT::datatable(df,options=list(bPaginate=FALSE)) ``` ### Collinearity problem as with MLEs ```{r} mcmc_areas(as.matrix(fitg),prob_outer = .99) ``` ### Improvement on predictions In case of even more variables with some being relevant and some irrelevant, it will be difficult to analyse joint posterior to see which variables are jointly relevant. We can easily test whether any of the covariates are useful by using cross-validation to compare to a null model ```{r} fitg0 <- stan_glm(y ~ 1, data = df, na.action = na.fail, family=poisson(), seed=SEED) (loog <- loo(fitg)) (loog0 <- loo(fitg0)) loo_compare(loog0, loog) ``` Based on cross-validation covariates together contain significant information to improve predictions. ### Variable selection Piironen and Vehtari (2017) also show that a projection predictive approach can be used to make a model reduction, that is, choosing a smaller model with some coefficients set to 0. The projection predictive approach solves the problem how to do inference after the selection. The solution is to project the full model posterior to the restricted subspace. See more in Piironen et.al (2018). We make the projective predictive variable selection using the previous full model. A fast leave-one-out cross-validation approach (Vehtari, Gelman and Gabry, 2017) is used to choose the model size. ```{r} library(projpred) fitg_cv <- cv_varsel(fitg, method='forward', cv_method='LOO',cores=16) ``` ### Recommendation for the model size to choose based on a LOO ```{r} plot(fitg_cv) ``` ### Selection model results ```{r} (nv <- suggest_size(fitg_cv, alpha=0.1)) projg <- project(fitg_cv, nv = nv, ns = 4000) round(colMeans(as.matrix(projg)),1) mcmc_areas(as.matrix(projg) ,prob_outer = .99) ``` ### Crude map of the world with capital cities: ```{r} library(ggplot2) library(plotly) library(maps) library(rbokeh) data(world.cities) caps <- subset(world.cities, capital == 1) caps$population <- prettyNum(caps$pop, big.mark = ",") figure(width = 800, height = 450, padding_factor = 0) %>% ly_map("world", col = "gray") %>% ly_points(long, lat, data = caps, size = 5, hover = c(name, country.etc, population)) ``` ### HighChart `HighCharts` dynamic chart ```{r} library(highcharter) data(penguins, package = "palmerpenguins") hchart(penguins, "scatter", hcaes(x=flipper_length_mm, y=bill_length_mm, group=species)) ``` Page 2 =========== Row1 ----------------- ### Articles per Day ```{r} articles<-100 valueBox(articles, icon="fa-pencil") ``` ### Comments per Day ```{r} comments<-200 valueBox(comments, icon="fa-comments") ``` ### Spam per Day ```{r} spam<-300 valueBox(spam, icon="fa-trash", color=ifelse(spam>10,"warning","primary")) ``` Row2 -------- ### Contact Rate ```{r} gauge(70, min = 0, max = 100, symbol = '%', gaugeSectors( success = c(80, 100), warning = c(40, 79), danger = c(0, 39) )) ``` ### Average Rating ```{r} gauge(23, min = 0, max = 50, gaugeSectors( success = c(41, 50), warning = c(21, 40), danger = c(0, 20) )) ``` ### Cancellations ```{r} gauge(7, min = 0, max = 10, gaugeSectors( success = c(0, 2), warning = c(3, 6), danger = c(7, 10) )) ``` Page3 =========== ```{r} library(plotly) fig <- plot_ly( domain = list(x = c(0, 1), y = c(0, 1)), value = 270, title = list(text = "Speed"), type = "indicator", mode = "gauge+number") fig <- fig %>% layout(margin = list(l=20,r=30)) fig ``` Page4 ============= Row1 ------------------ ```{r} library(tidyverse) library(data.table) library(dplyr) library(ggplot2) library(lubridate) library(stringr) library(plotly) ## Reading in files library(igraph) library(threejs) library(ggraph) library(tidygraph) # You can access files from datasets you've added to this kernel in the "../input/" directory. # You can see the files added to this kernel by running the code below. start.watch.date <- as.Date('2020-03-01') watched.countries.count <- 12 covid <- data.table(read.csv('covid_19_data.csv')) krcovid <- data.table(read.csv('PatientInfo.csv')) krcovid[,confirmed_date:=as.Date(gsub('2002','2020',as.character(confirmed_date)))] krcovid[,symptom_onset_date:=as.Date(as.character(symptom_onset_date))] krcovid[,released_date:=as.Date(as.character(released_date))] krcovid[,deceased_date:=as.Date(as.character(deceased_date),format='%Y-%m-%d')] #krcovid[,table(symptom_onset_date==confirmed_date)] #summary(krcovid) #head(covid) krcovid[,closed_date:=coalesce(released_date,deceased_date)] krcovid[,closed.days:=as.integer(difftime(closed_date,confirmed_date,unit='days'))] ggplot(krcovid[,.(n=.N,median_age=2020-median(birth_year,na.rm=T)),.(closed.days,confirmed_date,state)][,state:=ifelse(state=='deceased','death','recovered')])+ geom_point(aes(y=closed.days,x=confirmed_date,size=n,col=state))+ #geom_text(aes(y=closed.days,x=confirmed_date,label=median_age,col=state,hjust='left'))+ scale_x_date(date_labels = '%m/%d',date_breaks = '3 days')+ theme(axis.text.x=element_text(angle = 60, vjust = 0.5))+ scale_color_brewer(palette = 'Set1')+ labs(title='Days to close the cases in Korea') covid[,Last.Update:=as.Date(Last.Update,format='%m/%d/%Y')] covid[str_length(ObservationDate)==8,ObservationDate1:=as.Date(ObservationDate,format='%m/%d/%y',origin = orgin)] covid[str_length(ObservationDate)==10,ObservationDate1:=as.Date(ObservationDate,format='%m/%d/%Y',origin = orgin)] covid[,ObservationDate:=ObservationDate1] covid$ObservationDate1=NULL covid[grep('Korea',Country.Region),Country.Region:='Republic of Korea'] covid[grep('Iran',Country.Region),Country.Region:='Iran'] covid[grep('Taiwan',Province.State),Country.Region:='Taiwan'] covid[ grepl('Diamon',Province.State),Country.Region:='Diamond Princess'] str(covid) # The JHU modified the Country name of Taiwan which is suck both for analysis and human right ## quick summary the cases by country library(conflicted) conflict_prefer("lag", "dplyr") conflict_prefer("setdiff", "dplyr") covid.by.country <- covid[,.(Confirmed=sum(Confirmed),Recovered=sum(Recovered),Deaths=sum(Deaths)),.(ObservationDate,Country.Region)] setorder(covid.by.country,Country.Region,ObservationDate) covid.by.country[,Daily.Confirmed:=Confirmed-lag(Confirmed,1),.(Country.Region)] covid.by.country[,Daily.Recovered:=Recovered-lag(Recovered,1),.(Country.Region)] covid.by.country[,Daily.Deaths:=Deaths-lag(Deaths,1),.(Country.Region)] lookup.countries <- covid.by.country[order(-ObservationDate,Country.Region)][,head(.SD,1),.(Country.Region)][order(-Deaths),Country.Region] lookup.countries <- setdiff(lookup.countries,grep('Diamon',lookup.countries,value=T)) ``` Row2 ------------------ ```{r} ggplot(covid.by.country[grep('Italy|Korea',Country.Region)][,Daily.Confirmed:=ifelse(Daily.Confirmed>5000,2000,Daily.Confirmed)])+ geom_line(aes(x=ObservationDate,y=Daily.Recovered,col='Recovered'))+ geom_line(aes(x=ObservationDate,y=Daily.Deaths,col='Deaths'))+ geom_line(aes(x=ObservationDate,y=Daily.Confirmed,col='Confirmed'))+ facet_wrap(~Country.Region,scales = 'free',ncol=1)+ scale_x_date(date_labels = '%m/%d',date_breaks = '7 days') ``` Row3 ------------------ ```{r} ggplot(covid.by.country[grep('Italy|Korea',Country.Region)][ObservationDate>'2020-03-01'])+ geom_bar(aes(x=ObservationDate,y=Daily.Deaths,fill='Deaths'),stat = 'identity')+ geom_line(aes(x=ObservationDate,y=Daily.Confirmed),col='blue')+ facet_wrap(~Country.Region,scales = 'free',ncol=1)+ scale_x_date(date_labels = '%m/%d',date_breaks = '2 days') ``` Row4 ------------------ ```{r} ggplot(covid.by.country[Country.Region %in% lookup.countries[1:watched.countries.count]],aes(group=1))+ geom_line(aes(x=ObservationDate,y=Daily.Recovered,col='Recovered'))+ geom_line(aes(x=ObservationDate,y=Daily.Deaths,col='Deaths'))+ geom_line(aes(x=ObservationDate,y=Daily.Confirmed,col='Confirmed'))+ facet_wrap(~Country.Region,ncol=3,scale='free_y')+ scale_color_brewer(palette = 'Set1',name='Value Type')+ scale_x_date(date_breaks = '7 days',date_labels = '%m/%d',)+ theme(axis.text.x=element_text(angle = 90, vjust = 0.5))+ labs(title='COVID-19 daily cases') ```