Aim to analyze and summarize 1000 simulations based on the result of three distinct tests(Mclust, Hartigan Dip Test and Bimodality Coefficient) for various distributions(normal, weibull, beta). Power and FP are used as the criteria.
All graphs are interactive, so you’ll be able to explore the plots and discover insights. For example, by un-checking or checking the legend, you can focus on either one distribution or compare the performance of both. You can also use the upper right toolbox to zoom in, zoom out…
Multiple combined violin plots with box plots are to visualize the distribution of power or FP, and you’ll be able to compare performance of different groups. The curve in a violin plot represents a kernel density estimation of the underlying data distribution. Wider area of the curve indicates a higher density of data, while narrower curve indicates a lower density of data. If you hover over the plot, summary statistics including min, max, mean, median etc. will display. Solid line indicates the median, while dashed line shows the mean.
Scatter plots are used to show the relationship of Power vs. FP.
library("dplyr")
library(plotly)
<- readRDS("~/Nadeau/module2_n_w_sim_1000.rds")
df_n_w df_n_w
## Distribution Alpha Sim nboot SampleSize Prop mu1 sd1 mu2 sd2 N
## 1: norm 0.05 1000 100 20 0.1 0.01 1 1 1 20
## 2: norm 0.05 1000 100 20 0.1 0.01 1 1 1 20
## 3: norm 0.05 1000 100 20 0.1 0.01 1 1 1 20
## 4: weib 0.05 1000 100 20 0.1 0.01 1 1 1 20
## 5: weib 0.05 1000 100 20 0.1 0.01 1 1 1 20
## ---
## 1796: norm 0.05 1000 100 500 0.9 0.01 1 8 4 500
## 1797: norm 0.05 1000 100 500 0.9 0.01 1 8 4 500
## 1798: weib 0.05 1000 100 500 0.9 0.01 1 8 4 500
## 1799: weib 0.05 1000 100 500 0.9 0.01 1 8 4 500
## 1800: weib 0.05 1000 100 500 0.9 0.01 1 8 4 500
## Test power FP
## 1: Mclust 0.062 0.066
## 2: Bimodality Coefficient 0.005 0.003
## 3: Hartigan Dip Test 0.008 0.010
## 4: Mclust 0.716 0.781
## 5: Bimodality Coefficient 0.607 0.687
## ---
## 1796: Bimodality Coefficient 1.000 0.000
## 1797: Hartigan Dip Test 0.000 0.000
## 1798: Mclust 0.000 0.000
## 1799: Bimodality Coefficient 1.000 1.000
## 1800: Hartigan Dip Test 0.000 0.000
<- df_n_w %>%
fig_1 plot_ly(
type = "violin")
<- fig_1 %>%
fig_1 add_trace(
x = ~Test[df_n_w$Distribution == 'norm'],
y = ~power[df_n_w$Distribution == 'norm'],
legendgroup = 'Normal',
scalegroup = 'Normal',
name = 'Normal',
box = list(visible = T), # combine with a box plot, solid line(median)
meanline = list(visible = T), # dashed line(mean)
color = I('blue')
)
<- fig_1 %>%
fig_1 add_trace(
x = ~Test[df_n_w$Distribution == 'weib'],
y = ~power[df_n_w$Distribution == 'weib'],
legendgroup = 'Weibull',
scalegroup = 'Weibull',
name = 'Weibull',
box = list(visible = T),
meanline = list(visible = T),
color = I('pink')
)
<- fig_1 %>%
fig_1 layout(title = 'Test vs Power',
xaxis = list(title = 'Test', zeroline = FALSE),
yaxis = list(title = 'Power', zeroline = FALSE),
violinmode = 'group'
)
fig_1
<- df_n_w %>%
fig_2 plot_ly(
type = "violin")
<- fig_2 %>%
fig_2 add_trace(
x = ~Test[df_n_w$Distribution == 'norm'],
y = ~FP[df_n_w$Distribution == 'norm'],
legendgroup = 'Norm',
scalegroup = 'Norm',
name = 'Norm',
box = list(visible = T),
meanline = list(visible = T),
color = I('red')
)
<- fig_2 %>%
fig_2 add_trace(
x = ~Test[df_n_w$Distribution == 'weib'],
y = ~FP[df_n_w$Distribution == 'weib'],
legendgroup = 'Weibull',
scalegroup = 'Weibull',
name = 'Weibull',
box = list(visible = T),
meanline = list(visible = T),
color = I('green')
)
<- fig_2 %>%
fig_2 layout(title = 'Test vs FP',
xaxis = list(title = 'Test', zeroline = FALSE),
yaxis = list(title = 'FP', zeroline = FALSE),
violinmode = 'group'
)
fig_2
<- df_n_w[df_n_w$Distribution == 'norm',]
df_n <- plot_ly(data = df_n, type = 'scatter', x = ~power, y = ~FP, color = ~Test, mode = 'markers')
fig_3 <- fig_3 %>% layout(title = 'Power vs FP for Normal Distribution')
fig_3 fig_3
<- df_n_w[df_n_w$Distribution == 'weib',]
df_w
<- plot_ly(data = df_w, type = 'scatter', x = ~power, y = ~FP, color = ~Test, mode = 'markers')
fig_4 <- fig_4 %>% layout(title = 'Power vs FP for Weibull Distribution')
fig_4 fig_4
<- readRDS("~/Nadeau/module2_b_sim_1000.rds")
df_b df_b
## Distribution Alpha Sim nboot SampleSize Prop s1 s2 N
## 1: beta 0.05 1000 100 20 0.1 0.25 0.50 20
## 2: beta 0.05 1000 100 20 0.1 0.25 0.50 20
## 3: beta 0.05 1000 100 20 0.1 0.25 0.50 20
## 4: beta 0.05 1000 100 40 0.1 0.25 0.50 40
## 5: beta 0.05 1000 100 40 0.1 0.25 0.50 40
## ---
## 296: beta 0.05 1000 100 100 0.9 0.50 0.75 100
## 297: beta 0.05 1000 100 100 0.9 0.50 0.75 100
## 298: beta 0.05 1000 100 500 0.9 0.50 0.75 500
## 299: beta 0.05 1000 100 500 0.9 0.50 0.75 500
## 300: beta 0.05 1000 100 500 0.9 0.50 0.75 500
## Test power FP
## 1: Mclust 0.887 0.113
## 2: Bimodality Coefficient 0.835 0.013
## 3: Hartigan Dip Test 0.152 0.016
## 4: Mclust 0.997 0.212
## 5: Bimodality Coefficient 0.992 0.016
## ---
## 296: Bimodality Coefficient 0.980 0.005
## 297: Hartigan Dip Test 0.128 0.003
## 298: Mclust 1.000 1.000
## 299: Bimodality Coefficient 1.000 0.000
## 300: Hartigan Dip Test 0.304 0.000
<- df_b %>%
fig_5 plot_ly(
type = "violin")
<- fig_5 %>%
fig_5 add_trace(
x = ~Test[df_b$Distribution == 'beta'],
y = ~power[df_b$Distribution == 'beta'],
legendgroup = 'Beta',
scalegroup = 'Beta',
name = 'Beta',
box = list(visible = T),
meanline = list(visible = T),
color = I('orange')
)
<- fig_5 %>%
fig_5 layout(title = 'Test vs Power',
xaxis = list(title = 'Test', zeroline = FALSE),
yaxis = list(title = 'Power', zeroline = FALSE),
violinmode = 'group'
)
fig_5
<- df_b %>%
fig_6 plot_ly(
type = 'violin'
)
<- fig_6 %>%
fig_6 add_trace(
x = ~Test[df_b$Distribution == 'beta'],
y = ~FP[df_b$Distribution == 'beta'],
legendgroup = 'Beta',
scalegroup = 'Beta',
name = 'Beta',
box = list(visible = T),
meanline = list(visible = T),
color = I('purple')
)
<- fig_6 %>%
fig_6 layout(title = 'Test vs FP',
xaxis = list(title = 'Test', zeroline = FALSE),
yaxis = list(title = 'FP', zeroline = FALSE),
violinmode = 'group'
) fig_6
<- plot_ly(data = df_b, type = 'scatter', x = ~power, y = ~FP, color = ~Test, mode = 'markers')
fig_7 <- fig_7 %>% layout(title = 'Power vs FP for Beta Distribution')
fig_7 fig_7
Normal: Mclust has relative highest overall power, BC comes after, and HDT has the lowest power. Weibull: Mclust basically spread out evenly among the range of 0 and 1, BC mostly concentrates around 0.8 to 1, while HDT distributes mainly below 0.2. Beta: Mclust outperforms the other two, BC comes after, while HDT has the most cases around low power.
Normal: BC has the most cases that closer to 0 FP, HDT scatter mainly around 0 to 0.005, while Mclust is between 0.05 to 0.1.
Weibull: HDT has the lowest FP (0-0.022), while the other two are much higher (0-1).
Beta: BC and HDT indicate more observations of low FP, while Mclust exhibits more observations of high FP.
Normal: BC and HDT have relative more observations of higher Power-FP ratio, while Mclust has less cases.
Weibull: HDT shows the most observations of higher Power-FP ratio, while the other two have mediocre performance.
Beta: BC displays the most cases of high power-FP ratio, while more cases of low FP in HDT and high FP in Mclust.