Module_2_Simulation

1. Objective

Aim to analyze and summarize 1000 simulations based on the result of three distinct tests(Mclust, Hartigan Dip Test and Bimodality Coefficient) for various distributions(normal, weibull, beta). Power and FP are used as the criteria.

2. Introduction

All graphs are interactive, so you’ll be able to explore the plots and discover insights. For example, by un-checking or checking the legend, you can focus on either one distribution or compare the performance of both. You can also use the upper right toolbox to zoom in, zoom out…

Multiple combined violin plots with box plots are to visualize the distribution of power or FP, and you’ll be able to compare performance of different groups. The curve in a violin plot represents a kernel density estimation of the underlying data distribution. Wider area of the curve indicates a higher density of data, while narrower curve indicates a lower density of data. If you hover over the plot, summary statistics including min, max, mean, median etc. will display. Solid line indicates the median, while dashed line shows the mean.

Scatter plots are used to show the relationship of Power vs. FP.

library("dplyr")
library(plotly)

3. Normal and Weibull Distribution

df_n_w <- readRDS("~/Nadeau/module2_n_w_sim_1000.rds")
df_n_w

##       Distribution Alpha  Sim nboot SampleSize Prop  mu1 sd1 mu2 sd2   N
##    1:         norm  0.05 1000   100         20  0.1 0.01   1   1   1  20
##    2:         norm  0.05 1000   100         20  0.1 0.01   1   1   1  20
##    3:         norm  0.05 1000   100         20  0.1 0.01   1   1   1  20
##    4:         weib  0.05 1000   100         20  0.1 0.01   1   1   1  20
##    5:         weib  0.05 1000   100         20  0.1 0.01   1   1   1  20
##   ---                                                                   
## 1796:         norm  0.05 1000   100        500  0.9 0.01   1   8   4 500
## 1797:         norm  0.05 1000   100        500  0.9 0.01   1   8   4 500
## 1798:         weib  0.05 1000   100        500  0.9 0.01   1   8   4 500
## 1799:         weib  0.05 1000   100        500  0.9 0.01   1   8   4 500
## 1800:         weib  0.05 1000   100        500  0.9 0.01   1   8   4 500
##                         Test power    FP
##    1:                 Mclust 0.062 0.066
##    2: Bimodality Coefficient 0.005 0.003
##    3:      Hartigan Dip Test 0.008 0.010
##    4:                 Mclust 0.716 0.781
##    5: Bimodality Coefficient 0.607 0.687
##   ---                                   
## 1796: Bimodality Coefficient 1.000 0.000
## 1797:      Hartigan Dip Test 0.000 0.000
## 1798:                 Mclust 0.000 0.000
## 1799: Bimodality Coefficient 1.000 1.000
## 1800:      Hartigan Dip Test 0.000 0.000

3.1 Violin Plot for Test vs Power

  fig_1 <- df_n_w %>%
    plot_ly(
      type = "violin")
  
  fig_1 <- fig_1 %>%
    add_trace(
      x = ~Test[df_n_w$Distribution == 'norm'],
      y = ~power[df_n_w$Distribution == 'norm'],
      legendgroup = 'Normal',
      scalegroup = 'Normal',
      name = 'Normal',
      box = list(visible = T),        # combine with a box plot, solid line(median)
      meanline = list(visible = T),   # dashed line(mean)
      color = I('blue')
    )
  
  
  fig_1 <- fig_1 %>%
    add_trace(
      x = ~Test[df_n_w$Distribution == 'weib'],
      y = ~power[df_n_w$Distribution == 'weib'],
      legendgroup = 'Weibull',
      scalegroup = 'Weibull',
      name = 'Weibull',
      box = list(visible = T),
      meanline = list(visible = T),
      color = I('pink')
    )
  
  fig_1 <- fig_1 %>%
    layout(title = 'Test vs Power',
           xaxis = list(title = 'Test', zeroline = FALSE),
           yaxis = list(title = 'Power', zeroline = FALSE),
           violinmode = 'group'
    )
  
  fig_1

3.2 Violin Plot for Test vs FP

fig_2 <- df_n_w %>%
    plot_ly(
    type = "violin")

fig_2 <- fig_2 %>%
  add_trace(
    x = ~Test[df_n_w$Distribution == 'norm'],
    y = ~FP[df_n_w$Distribution == 'norm'],
    legendgroup = 'Norm',
    scalegroup = 'Norm',
    name = 'Norm',
    box = list(visible = T),
    meanline = list(visible = T),
    color = I('red')
  )

fig_2 <- fig_2 %>%
  add_trace(
    x = ~Test[df_n_w$Distribution == 'weib'],
    y = ~FP[df_n_w$Distribution == 'weib'],
    legendgroup = 'Weibull',
    scalegroup = 'Weibull',
    name = 'Weibull',
    box = list(visible = T),
    meanline = list(visible = T),
    color = I('green')
  )

fig_2 <- fig_2 %>%
  layout(title = 'Test vs FP',
         xaxis = list(title = 'Test', zeroline = FALSE),
         yaxis = list(title = 'FP', zeroline = FALSE),
         violinmode = 'group'
  )

fig_2

3.3 Normal Distribution: Scatter Plot for Power vs FP

df_n <- df_n_w[df_n_w$Distribution == 'norm',]
fig_3 <- plot_ly(data = df_n, type = 'scatter', x = ~power, y = ~FP, color = ~Test, mode = 'markers')
fig_3 <- fig_3 %>% layout(title = 'Power vs FP for Normal Distribution')
fig_3

3.4 Weibull Distribution: Scatter Plot for Power vs FP

df_w <- df_n_w[df_n_w$Distribution == 'weib',]

fig_4 <- plot_ly(data = df_w, type = 'scatter', x = ~power, y = ~FP, color = ~Test, mode = 'markers')
fig_4 <- fig_4 %>% layout(title = 'Power vs FP for Weibull Distribution')
fig_4

4. Beta Distribution

df_b <- readRDS("~/Nadeau/module2_b_sim_1000.rds")
df_b

##      Distribution Alpha  Sim nboot SampleSize Prop   s1   s2   N
##   1:         beta  0.05 1000   100         20  0.1 0.25 0.50  20
##   2:         beta  0.05 1000   100         20  0.1 0.25 0.50  20
##   3:         beta  0.05 1000   100         20  0.1 0.25 0.50  20
##   4:         beta  0.05 1000   100         40  0.1 0.25 0.50  40
##   5:         beta  0.05 1000   100         40  0.1 0.25 0.50  40
##  ---                                                            
## 296:         beta  0.05 1000   100        100  0.9 0.50 0.75 100
## 297:         beta  0.05 1000   100        100  0.9 0.50 0.75 100
## 298:         beta  0.05 1000   100        500  0.9 0.50 0.75 500
## 299:         beta  0.05 1000   100        500  0.9 0.50 0.75 500
## 300:         beta  0.05 1000   100        500  0.9 0.50 0.75 500
##                        Test power    FP
##   1:                 Mclust 0.887 0.113
##   2: Bimodality Coefficient 0.835 0.013
##   3:      Hartigan Dip Test 0.152 0.016
##   4:                 Mclust 0.997 0.212
##   5: Bimodality Coefficient 0.992 0.016
##  ---                                   
## 296: Bimodality Coefficient 0.980 0.005
## 297:      Hartigan Dip Test 0.128 0.003
## 298:                 Mclust 1.000 1.000
## 299: Bimodality Coefficient 1.000 0.000
## 300:      Hartigan Dip Test 0.304 0.000

4.1 Violin Plot for Test vs Power

fig_5 <- df_b %>%
  plot_ly(
    type = "violin")

fig_5 <- fig_5 %>%
  add_trace(
    x = ~Test[df_b$Distribution == 'beta'],
    y = ~power[df_b$Distribution == 'beta'],
    legendgroup = 'Beta',
    scalegroup = 'Beta',
    name = 'Beta',
    box = list(visible = T),
    meanline = list(visible = T),
    color = I('orange')
  )

fig_5 <- fig_5 %>%
  layout(title = 'Test vs Power',
         xaxis = list(title = 'Test', zeroline = FALSE),
         yaxis = list(title = 'Power', zeroline = FALSE),
         violinmode = 'group'
  )


fig_5

4.2 Violin Plot for Test vs FP

fig_6 <- df_b %>%
    plot_ly(
      type = 'violin'
    )

fig_6 <- fig_6 %>%
  add_trace(
    x = ~Test[df_b$Distribution == 'beta'],
    y = ~FP[df_b$Distribution == 'beta'],
    legendgroup = 'Beta',
    scalegroup = 'Beta',
    name = 'Beta',
    box = list(visible = T),
    meanline = list(visible = T),
    color = I('purple')
  )

fig_6 <- fig_6 %>%
  layout(title = 'Test vs FP',
         xaxis = list(title = 'Test', zeroline = FALSE),
         yaxis = list(title = 'FP', zeroline = FALSE),
         violinmode = 'group'
  )
fig_6

4.3 Scatter Plot for Power vs FP

fig_7 <- plot_ly(data = df_b, type = 'scatter', x = ~power, y = ~FP, color = ~Test, mode = 'markers')
fig_7 <- fig_7 %>% layout(title = 'Power vs FP for Beta Distribution')
fig_7

5. Summary

5.1 Power:

Normal: Mclust has relative highest overall power, BC comes after, and HDT has the lowest power. Weibull: Mclust basically spread out evenly among the range of 0 and 1, BC mostly concentrates around 0.8 to 1, while HDT distributes mainly below 0.2. Beta: Mclust outperforms the other two, BC comes after, while HDT has the most cases around low power.

5.2 FP:

Normal: BC has the most cases that closer to 0 FP, HDT scatter mainly around 0 to 0.005, while Mclust is between 0.05 to 0.1.

Weibull: HDT has the lowest FP (0-0.022), while the other two are much higher (0-1).

Beta: BC and HDT indicate more observations of low FP, while Mclust exhibits more observations of high FP.

5.3 Power vs. FP:

Normal: BC and HDT have relative more observations of higher Power-FP ratio, while Mclust has less cases.

Weibull: HDT shows the most observations of higher Power-FP ratio, while the other two have mediocre performance.

Beta: BC displays the most cases of high power-FP ratio, while more cases of low FP in HDT and high FP in Mclust.