Aim to test Joe’s mice AY column data bimodality
library(plotly)
library(mclust)
<- read.csv("~/Nadeau/bifurcatorR_Mao's fork/bifurcatoR_Mao's fork/data/mice_AY_column.csv")
data
cat("The sample size is:", nrow(data))
## The sample size is: 48
cat("\n\nThe summary of the data is:\n")
##
##
## The summary of the data is:
summary(data$MAD)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1500 0.7188 1.5050 1.8997 2.8188 6.0250
plot_ly(x = ~data$MAD, type = "histogram", name = "Histogram", nbinsx = 10) %>%
add_lines(x = density(data$MAD)$x,
y = density(data$MAD)$y,
yaxis = "y2",
name = "Density")%>%
layout(title = "Mice MAD",
xaxis = list(title = "MAD", zeroline = FALSE),
yaxis = list(title = "Frequency", zeroline = FALSE),
yaxis2 = list(overlaying = "y", side = "right", rangemode = "tozero"),
bargap = 0.1)
BIC is the value of the maximized log-likelihood with a penalty on the number of parameters in the model. In general, the larger the value of the BIC, the stronger the evidence for the model and number of clusters.
<- densityMclust(data$MAD) mod
plot(mod, what = "BIC")
#plot(mod, what = "density")
summary(mod)
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust V (univariate, unequal variance) model with 2 components:
##
## log-likelihood n df BIC ICL
## -72.42638 48 5 -164.2088 -173.215
By comparing BIC values above, model E with 2 clusters has the highest BIC value(-164.2088). Therefore, the data is identified as bimodality. The density plot also presents the same characteristic visually, the second peak value is lower than the first one though.