Mice AY column bimodality

Objective

Aim to test Joe’s mice AY column data bimodality

Data loading and exploring

library(plotly)
library(mclust)

data <- read.csv("~/Nadeau/bifurcatorR_Mao's fork/bifurcatoR_Mao's fork/data/mice_AY_column.csv")


cat("The sample size is:", nrow(data))

## The sample size is: 48

cat("\n\nThe summary of the data is:\n")

## 
## 
## The summary of the data is:

summary(data$MAD)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1500  0.7188  1.5050  1.8997  2.8188  6.0250

Visualize mice MAD using histogram with density curve line

plot_ly(x = ~data$MAD, type = "histogram", name = "Histogram", nbinsx = 10) %>%
       add_lines(x = density(data$MAD)$x, 
                 y = density(data$MAD)$y,
                 yaxis = "y2",
                 name = "Density")%>%
       layout(title = "Mice MAD",
              xaxis = list(title = "MAD", zeroline = FALSE),
              yaxis = list(title = "Frequency", zeroline = FALSE),
              yaxis2 = list(overlaying = "y", side = "right", rangemode = "tozero"),
              bargap = 0.1)

Model-based Clustering using BIC(Bayesian Information Criterion)

BIC is the value of the maximized log-likelihood with a penalty on the number of parameters in the model. In general, the larger the value of the BIC, the stronger the evidence for the model and number of clusters.

mod <- densityMclust(data$MAD)

plot(mod, what = "BIC")

#plot(mod, what = "density")
summary(mod)

## ------------------------------------------------------- 
## Density estimation via Gaussian finite mixture modeling 
## ------------------------------------------------------- 
## 
## Mclust V (univariate, unequal variance) model with 2 components: 
## 
##  log-likelihood  n df       BIC      ICL
##       -72.42638 48  5 -164.2088 -173.215

Conclusion

By comparing BIC values above, model E with 2 clusters has the highest BIC value(-164.2088). Therefore, the data is identified as bimodality. The density plot also presents the same characteristic visually, the second peak value is lower than the first one though.

Reference:

https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html http://sites.stat.washington.edu/raftery/Research/PDF/fraley2003.pdf https://pages.mtu.edu/~shanem/psy5220/daily/Day19/modelbasedclustering.html