The procedure of partitioning a set of observations into a set of meaningful subclasses or clusters. All the observations in a cluster share some similarities but are essentially distinct from the cluster observations
Clustering methods
Partitioning techniques: find the centers of clusters among the observations and each one is assigned to the cluster that has the closest center. Ex: K-means
Hierarchical techniques: Connect the observations based on their similarity to from clusters. Ex: Hierarchical cluster
Model-based methods: Use probabilistic distributions to create clusters. Ex. Mixture models. It’s suitable when we are interested not only in the clusters but also in measuring the probability of belonging to a cluster
Instead of optimizing each observation to clsuters’ centers, like k-means, we optimize a probabilistic model
library(ggplot2)
library(dplyr)
library(mixtools) # the poisson ditribution is not implemented
## Warning: package 'mixtools' was built under R version 4.0.3
library(flexmix)
## Warning: package 'flexmix' was built under R version 4.0.3
library(magrittr)
library(ellipse)
## Warning: package 'ellipse' was built under R version 4.0.3
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.3
gender <- read.csv("gender.csv")
data <- read.csv("digits.csv")
crimes <- read.csv("CoC_crimes.csv")
# Have a look to gender_with_probs (after clustering)
head(gender )
## Gender Height Weight BMI probability
## 1 Male 73.84702 241.8936 0.04435662 5.778312e-06
## 2 Male 68.78190 162.3105 0.03430822 6.059525e-01
## 3 Male 74.11011 212.7409 0.03873433 2.625952e-05
## 4 Male 71.73098 220.0425 0.04276545 3.628734e-04
## 5 Male 69.88180 206.3498 0.04225479 4.611901e-03
## 6 Male 67.25302 152.2122 0.03365316 9.110687e-01
# Scatterplot with probabilities
gender %>%
ggplot(aes(x = Weight, y = BMI, col = probability))+
geom_point(alpha = 0.5)
population_sample <- rnorm(1000, 10, 2)
mean_estimate <- mean(population_sample)
std_estimate <- sd(population_sample)
population_sample %<>% data.frame(x = population_sample)
ggplot(data =population_sample) +
geom_histogram(aes(x = x, y = ..density..)) +
stat_function(geom = "line", fun = dnorm, args = list(mean =mean_estimate,
sd= std_estimate)) +
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Plot the sample with the estimated curve
mean_estimate <- gender$Weight %>% mean()
sd_estimate <- gender$Weight %>% sd()
gender %>%
ggplot() + geom_histogram(aes(x = Weight, y = ..density..), bins = 100) +
stat_function(geom = "line", fun = dnorm,
args = list(mean = mean_estimate, sd = sd_estimate)) +
theme_minimal()
It appears to be two different graussian distributions, but we are treating them as one.
number_of_obs <- 500
coin <- sample(c(0, 1), size = number_of_obs,
replace = TRUE, prob = c(0.5, 0.5))
head(coin)
## [1] 1 0 1 0 1 0
table(coin)
## coin
## 0 1
## 237 263
# Gaussian 1: Heads
gauss_1 <- rnorm(n = number_of_obs, mean = 5, sd = 2)
# Gaussian 2: Tails
gauss_2 <- rnorm(n = number_of_obs)
mixture_simulation <- ifelse(coin, gauss_1, gauss_2)
head(cbind(coin, gauss_1, gauss_2, mixture_simulation))
## coin gauss_1 gauss_2 mixture_simulation
## [1,] 1 4.801685 -1.1031314 4.8016846
## [2,] 0 6.984278 -0.1769560 -0.1769560
## [3,] 1 4.207626 -1.5725863 4.2076261
## [4,] 0 1.798294 -1.5954626 -1.5954626
## [5,] 1 7.730304 -0.5768537 7.7303043
## [6,] 0 4.488410 -0.2602325 -0.2602325
mixture_simulation %<>% data.frame(x = mixture_simulation)
ggplot(mixture_simulation) +
geom_histogram(aes(x=x, ..density..), bins = 40)
proportions <- sample(c(0, 1, 2), number_of_obs,
replace = TRUE, prob =c(1/3, 1/3, 1/3))
gauss_3 <- rnorm(n= number_of_obs,mean= 10, sd = 1)
mixture_simulation <- data.frame(x = ifelse(proportions == 0, gauss_1,
ifelse(proportions == 1, gauss_2, gauss_3)))
ggplot(mixture_simulation) +
geom_histogram(aes(x=x, y =..density..))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
To cluster the data with mixture model, there are three questions to be answered:
Imagine you work for a company that has M different products and N clients. You have been provided by a matrix where each row represents a client and each column, a product. The elements of this matrix are the quantity of the corresponding product bought last year by the client. If you are asked to find clusters of clients, which distribution would you use?
R: POISSON DISTRIBUTION
show_digit <- function(arr256, col=gray(4:1/4), ...) {
arr256 <- as.numeric(arr256)
image(matrix(arr256, nrow=16)[,16:1],col=col,...)
}
# Apply `glimpse` to the data
glimpse(data)
## Rows: 1,593
## Columns: 266
## $ V1 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V2 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V3 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ V4 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ V5 <int> 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,...
## $ V6 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0,...
## $ V7 <int> 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V8 <int> 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V9 <int> 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V10 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V11 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0,...
## $ V12 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0,...
## $ V13 <int> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,...
## $ V14 <int> 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,...
## $ V15 <int> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,...
## $ V16 <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V17 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V18 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ V19 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,...
## $ V20 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0,...
## $ V21 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V22 <int> 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0,...
## $ V23 <int> 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ V24 <int> 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ V25 <int> 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ V26 <int> 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0,...
## $ V27 <int> 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V28 <int> 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V29 <int> 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ V30 <int> 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,...
## $ V31 <int> 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,...
## $ V32 <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V33 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ V34 <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,...
## $ V35 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V36 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V37 <int> 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,...
## $ V38 <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ V39 <int> 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,...
## $ V40 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,...
## $ V41 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ V42 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ V43 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ V44 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0,...
## $ V45 <int> 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1,...
## $ V46 <int> 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1,...
## $ V47 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1,...
## $ V48 <int> 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,...
## $ V49 <int> 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,...
## $ V50 <int> 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V51 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V52 <int> 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,...
## $ V53 <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V54 <int> 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V55 <int> 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V56 <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V57 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V58 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V59 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V60 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ V61 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1,...
## $ V62 <int> 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,...
## $ V63 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1,...
## $ V64 <int> 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,...
## $ V65 <int> 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0,...
## $ V66 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V67 <int> 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0,...
## $ V68 <int> 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,...
## $ V69 <int> 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V70 <int> 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V71 <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V72 <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V73 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V74 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V75 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V76 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V77 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,...
## $ V78 <int> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,...
## $ V79 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ V80 <int> 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0,...
## $ V81 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V82 <int> 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V83 <int> 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0,...
## $ V84 <int> 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V85 <int> 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V86 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V87 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V88 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V89 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V90 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V91 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V92 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V93 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V94 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1,...
## $ V95 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ V96 <int> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0,...
## $ V97 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V98 <int> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V99 <int> 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ V100 <int> 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V101 <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V102 <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V103 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V104 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V105 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V106 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V107 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V108 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V109 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V110 <int> 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1,...
## $ V111 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V112 <int> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V113 <int> 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V114 <int> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V115 <int> 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0,...
## $ V116 <int> 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V117 <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V118 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V119 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V120 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V121 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V122 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V123 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V124 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V125 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V126 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,...
## $ V127 <int> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V128 <int> 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V129 <int> 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V130 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V131 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0,...
## $ V132 <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V133 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V134 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V135 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V136 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V137 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V138 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V139 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V140 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V141 <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V142 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V143 <int> 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V144 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V145 <int> 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V146 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V147 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0,...
## $ V148 <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V149 <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V150 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V151 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V152 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V153 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V154 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V155 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V156 <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V157 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,...
## $ V158 <int> 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V159 <int> 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V160 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0,...
## $ V161 <int> 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V162 <int> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0,...
## $ V163 <int> 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,...
## $ V164 <int> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,...
## $ V165 <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,...
## $ V166 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V167 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V168 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V169 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V170 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V171 <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V172 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,...
## $ V173 <int> 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V174 <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0,...
## $ V175 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V176 <int> 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0,...
## $ V177 <int> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ V178 <int> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0,...
## $ V179 <int> 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0,...
## $ V180 <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0,...
## $ V181 <int> 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ V182 <int> 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V183 <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V184 <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V185 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V186 <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V187 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,...
## $ V188 <int> 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,...
## $ V189 <int> 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0,...
## $ V190 <int> 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V191 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V192 <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ V193 <int> 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ V194 <int> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ V195 <int> 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0,...
## $ V196 <int> 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0,...
## $ V197 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1,...
## $ V198 <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V199 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V200 <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V201 <int> 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ V202 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1,...
## $ V203 <int> 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,...
## $ V204 <int> 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,...
## $ V205 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V206 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V207 <int> 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0,...
## $ V208 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V209 <int> 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,...
## $ V210 <int> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,...
## $ V211 <int> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1,...
## $ V212 <int> 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ V213 <int> 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1,...
## $ V214 <int> 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1,...
## $ V215 <int> 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ V216 <int> 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,...
## $ V217 <int> 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0,...
## $ V218 <int> 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1,...
## $ V219 <int> 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1,...
## $ V220 <int> 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,...
## $ V221 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V222 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0,...
## $ V223 <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ V224 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V225 <int> 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V226 <int> 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1,...
## $ V227 <int> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,...
## $ V228 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1,...
## $ V229 <int> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ V230 <int> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,...
## $ V231 <int> 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V232 <int> 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V233 <int> 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V234 <int> 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V235 <int> 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,...
## $ V236 <int> 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1,...
## $ V237 <int> 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,...
## $ V238 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V239 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V240 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V241 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V242 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V243 <int> 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V244 <int> 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ V245 <int> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,...
## $ V246 <int> 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,...
## $ V247 <int> 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V248 <int> 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ V249 <int> 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,...
## $ V250 <int> 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0,...
## $ V251 <int> 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ V252 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ V253 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V254 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V255 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V256 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V257 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,...
## $ V258 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ V259 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V260 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V261 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V262 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V263 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V264 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V265 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ V266 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
# Digit in row 50
show_digit(data[50,])
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
# Digit in row 100
show_digit(data[100,])
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
Parameters estimation for mixture models is not an easy task. But if you are provided with the probabilities of belonging to each cluster, the estimation of the means and the proportions of the clusters is not so difficult.
random_numbers <- c(rnorm(5000, 5, 2), rnorm(5000))
probs <- seq(10000, 1, by=-1) / 10000
gaussian_sample_with_probs <- data.frame(
x = random_numbers,
prob_cluster1 = probs
) %>% mutate( prob_cluster2 = 1 - prob_cluster1) %>%
arrange(desc(prob_cluster1))
head(gaussian_sample_with_probs)
## x prob_cluster1 prob_cluster2
## 1 1.66212734 1.0000 0e+00
## 2 -0.04419671 0.9999 1e-04
## 3 8.14058740 0.9998 2e-04
## 4 5.14076818 0.9997 3e-04
## 5 1.19245685 0.9996 4e-04
## 6 5.11770965 0.9995 5e-04
# Estimation of the means
means_estimates <- gaussian_sample_with_probs %>%
summarise(mean_cluster1= sum(x*prob_cluster1)/sum(prob_cluster1),
mean_cluster2 = sum(x*prob_cluster2)/sum(prob_cluster2))
means_estimates
## mean_cluster1 mean_cluster2
## 1 3.711209 1.223994
# Estimation of the proportions
props_estimates <- gaussian_sample_with_probs %>%
summarise(props_cluster1 = mean(prob_cluster1),
props_cluster2 = 1 - mean(prob_cluster1))
props_estimates
## props_cluster1 props_cluster2
## 1 0.50005 0.49995
# Transform to a vector
means_estimates <- as.numeric(means_estimates)
# Plot histogram with means estimates
ggplot(gaussian_sample_with_probs) + geom_histogram(aes(x = x), bins = 100) +
geom_vline(xintercept = means_estimates)
gaussian_sample <- gaussian_sample_with_probs %>% select(x)
# Create data frame with probabilities
gaussian_sample_with_probs <- gaussian_sample %>%
mutate(prob_from_cluster1 = 0.35 * dnorm(x, mean = 10, sd = 10),
prob_from_cluster2 = 0.65 * dnorm(x, mean = 50, sd = 10),
prob_cluster1 = prob_from_cluster1/(prob_from_cluster1 + prob_from_cluster2),
prob_cluster2 = prob_from_cluster2/(prob_from_cluster1 + prob_from_cluster2)) %>%
select(x, prob_cluster1, prob_cluster2)
head(gaussian_sample_with_probs)
## x prob_cluster1 prob_cluster2
## 1 1.66212734 0.9999778 2.218416e-05
## 2 -0.04419671 0.9999888 1.121060e-05
## 3 8.14058740 0.9997040 2.960382e-04
## 4 5.14076818 0.9999108 8.918990e-05
## 5 1.19245685 0.9999816 1.838461e-05
## 6 5.11770965 0.9999116 8.837112e-05
# Arbitrary mean
means_init <- c(1,2)
means_init
## [1] 1 2
# Arbitrary proportions
props_init <- c(0.5, 0.5)
props_init
## [1] 0.5 0.5
expectation <- function(data, means, proportions, sds){
# Estimate the probabilities
exp_data <- data %>%
mutate(prob_from_cluster1 = proportions[1] * dnorm(x, mean = means[1], sd = sds[1]),
prob_from_cluster2 = proportions[2] * dnorm(x, mean = means[2], sd = sds[2]),
prob_cluster1 = prob_from_cluster1/(prob_from_cluster1 + prob_from_cluster2),
prob_cluster2 = prob_from_cluster2/(prob_from_cluster1 + prob_from_cluster2)) %>%
select(x, prob_cluster1, prob_cluster2)
# Return data with probabilities
return(exp_data)
}
maximization <- function(data_with_probs){
means_estimates <- data_with_probs %>%
summarise(mean_1 = sum(x * prob_cluster1) / sum(prob_cluster1),
mean_2 = sum(x * prob_cluster2) / sum(prob_cluster2)) %>%
as.numeric()
props_estimates <- data_with_probs %>%
summarise(proportion_1 = mean(prob_cluster1),
proportion_2 = 1 - proportion_1) %>%
as.numeric()
list(means_estimates, props_estimates)
}
means_init <- c(0, 100)
props_init <- c(0.5, 0.5)
# Iterative process
for(i in 1:1){
new_values <- maximization(expectation(gaussian_sample, means_init, props_init, c(10, 10)))
means_init <- new_values[[1]]
props_init <- new_values[[2]]
cat(c(i, means_init, props_init), "\n")
}
## 1 2.467726 8.864184 1 0
fun_gaussian <- function(x, mean, proportion){
proportion * dnorm(x, mean, sd = 10)
}
means_iter10 <- means_init
props_iter10 <- props_init
gaussian_sample %>%
ggplot() + geom_histogram(aes(x = x, y = ..density..), bins = 200) +
stat_function(geom = "line", fun = fun_gaussian,
args = list(mean = means_iter10[1], proportion = props_iter10[1])) +
stat_function(geom = "line", fun = fun_gaussian,
args = list(mean = means_iter10[2], proportion = props_iter10[2]))
## Univariate Gaussian Mixture models
To understand what type of mixture model we might use we must do EDA.
gender %>%
ggplot(aes(x=Weight)) + geom_histogram(bins=100)
Graphically we can identify Two clusters
Parameters:
Two means
Two Standard deviations
Two proportions
EM algorithm
set.seed(1515)
fit_mixture = flexmix(Weight ~ 1,
data=gender,
k = 2,
model = FLXMCnorm1(),
control = list(tol = 1e-15,
verbose = 0,
iter = 10000))
proportions <- prior(fit_mixture)
proportions
## [1] 0.5070334 0.4929666
comp_1 <- parameters(fit_mixture, component = 1)
comp_2 <- parameters(fit_mixture, component = 2)
parameters(fit_mixture)
## Comp.1 Comp.2
## mean 186.61523 135.54712
## sd 19.96247 18.94874
fun_prop <- function(x, mean, sd, proportion){
proportion * dnorm(x = x, mean = mean, sd = sd)
}
gender %>%
ggplot() + geom_histogram(aes(Weight, y = ..density..)) +
stat_function(geom = "line", fun = fun_prop,
args = list(mean = comp_1[1],
sd = comp_1[2],
proportion = proportions[1])) +
stat_function(geom = "line", fun = fun_prop,
args = list(mean = comp_2[1],
sd = comp_2[2],
proportion = proportions[2]))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Probability to belong to each cluster
posterior(fit_mixture) %>% head(20)
## [,1] [,2]
## [1,] 0.9999931 6.850964e-06
## [2,] 0.5578120 4.421880e-01
## [3,] 0.9993998 6.002129e-04
## [4,] 0.9997998 2.001852e-04
## [5,] 0.9984504 1.549566e-03
## [6,] 0.2455903 7.544097e-01
## [7,] 0.9618190 3.818101e-02
## [8,] 0.7318215 2.681785e-01
## [9,] 0.8912543 1.087457e-01
## [10,] 0.3626267 6.373733e-01
## [11,] 0.9735642 2.643577e-02
## [12,] 0.9994832 5.168492e-04
## [13,] 0.7085447 2.914553e-01
## [14,] 0.9822151 1.778490e-02
## [15,] 0.9729320 2.706796e-02
## [16,] 0.8298096 1.701904e-01
## [17,] 0.9930274 6.972603e-03
## [18,] 0.8429897 1.570103e-01
## [19,] 0.9711951 2.880493e-02
## [20,] 0.9532200 4.677999e-02
Assign to cluster
clusters(fit_mixture) %>% head()
## [1] 1 1 1 1 1 2
table(gender$Gender, clusters(fit_mixture))
##
## 1 2
## Female 500 4500
## Male 4556 444
Different to univariate GMM, in the bivariate case now we must estimate the means in two dimensions and the standard deviation and de covariance between the two variables (covariance matrix), and the proportions.
set.seed(1313)
fit_with_covariance <- flexmix(cbind(Weight,BMI) ~ 1,
data = gender,
k = 2,
model = FLXMCmvnorm(diag = TRUE), # Change to FaLSE to allow covariance between variables
control = list(tolerance = 1e-15, iter.max = 1000))
# Get the parameters
comp_1 <- parameters(fit_with_covariance, component = 1)
comp_2 <- parameters(fit_with_covariance, component = 2)
# The means
mean_comp_1 <- comp_1[1:2]
mean_comp_1
## [1] 186.30915383 0.03914529
mean_comp_2 <- comp_2[1:2]
mean_comp_2
## [1] 133.23110205 0.03293304
# The covariance matrices
covariance_comp_1 <- matrix(comp_1[3:6], nrow = 2)
covariance_comp_1
## [,1] [,2]
## [1,] 366.8305 0.000000e+00
## [2,] 0.0000 4.071906e-06
covariance_comp_2 <- matrix(comp_2[3:6], nrow = 2)
covariance_comp_2
## [,1] [,2]
## [1,] 286.8994 0.000000e+00
## [2,] 0.0000 6.202358e-06
# Create ellipse curve 1
ellipse_comp_1 <- ellipse(x = covariance_comp_1,
centre = mean_comp_1,
npoints = nrow(gender))
head(ellipse_comp_1)
## x y
## [1,] 219.4592 0.04263790
## [2,] 219.4384 0.04264010
## [3,] 219.4175 0.04264229
## [4,] 219.3967 0.04264448
## [5,] 219.3758 0.04264667
## [6,] 219.3549 0.04264886
# Create ellipse curve 2
ellipse_comp_2 <- ellipse(x = covariance_comp_2,
centre = mean_comp_2,
npoints = nrow(gender))
head(ellipse_comp_2)
## x y
## [1,] 162.5479 0.03724356
## [2,] 162.5295 0.03724627
## [3,] 162.5110 0.03724897
## [4,] 162.4926 0.03725168
## [5,] 162.4741 0.03725438
## [6,] 162.4556 0.03725708
# Plot the ellipses
gender %>%
ggplot(aes(x = Weight, y = BMI)) + geom_point()+
geom_path(data = data.frame(ellipse_comp_1), aes(x=x,y=y), col = "red") +
geom_path(data = data.frame(ellipse_comp_2), aes(x=x,y=y), col = "blue")
# Check the assignments
table(gender$Gender, clusters(fit_with_covariance))
##
## 1 2
## Female 595 4405
## Male 4723 277
# Create the vector of probabilities
p_cluster_1 <- c(0.8, 0.8, 0.2, 0.9)
# Create the sample for each pixel
pixel_1 <- sample(c(0, 1), 100, replace = TRUE,
prob = c(1-p_cluster_1[1], p_cluster_1[1]))
pixel_2 <- sample(c(0, 1), 100, replace = TRUE,
prob = c(1-p_cluster_1[2], p_cluster_1[2]))
pixel_3 <- sample(c(0, 1), 100, replace = TRUE,
prob = c(1-p_cluster_1[3], p_cluster_1[3]))
pixel_4 <- sample(c(0, 1), 100, replace = TRUE,
prob = c(1-p_cluster_1[4], p_cluster_1[4]))
# Combine the samples
sample_cluster_1 <- cbind(pixel_1, pixel_2, pixel_3, pixel_4)
# Have a look to the sample
head(sample_cluster_1)
## pixel_1 pixel_2 pixel_3 pixel_4
## [1,] 1 0 1 1
## [2,] 1 1 0 1
## [3,] 1 1 0 1
## [4,] 1 1 1 1
## [5,] 1 1 0 1
## [6,] 1 1 0 1
data_matrix <- as.matrix(data[])
show_digit(data_matrix[4,])
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
show_digit(data_matrix[67,])
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
set.seed(1513)
# Fit Bernoulli mixture model
bernoulli_mix_model <- flexmix(data_matrix ~ 1,
k = 10,
model = FLXMCmvbinary(),
control = list(tolerance = 1e-15, iter.max = 1000))
# Check the proportions
prior(bernoulli_mix_model)
## [1] 0.07863443 0.08959832 0.11288063 0.15800281 0.17153762 0.20231154 0.08909212
## [8] 0.09794253
# Extract the parameters for each cluster
param_comp_1 <- parameters(bernoulli_mix_model, component = 1)
param_comp_2 <- parameters(bernoulli_mix_model, component = 2)
param_comp_3 <- parameters(bernoulli_mix_model, component = 3)
param_comp_4 <- parameters(bernoulli_mix_model, component = 4)
# Visualize the clusters
show_digit(param_comp_1)
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
show_digit(param_comp_2)
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
show_digit(param_comp_3)
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
show_digit(param_comp_4)
## Warning in matrix(arr256, nrow = 16): la longitud de los datos [266] no es un
## submúltiplo o múltiplo del número de filas [16] en la matriz
Simulation
set.seed(1541)
# Create the vector of lambdas
lambda_1 <- c(150, 300, 50)
# Create the sample of each crime
assault_1 <- rpois(n = 10, lambda = lambda_1[1])
robbery_1 <- rpois(n = 10, lambda = lambda_1[2])
battery_1 <- rpois(n = 10, lambda = lambda_1[3])
# Combine the results
cities_1 <- cbind(assault_1, robbery_1, battery_1)
# Check the sample
cities_1
## assault_1 robbery_1 battery_1
## [1,] 155 287 56
## [2,] 146 311 49
## [3,] 151 304 40
## [4,] 122 326 46
## [5,] 149 283 48
## [6,] 143 269 53
## [7,] 142 295 52
## [8,] 136 289 43
## [9,] 153 297 54
## [10,] 163 311 58
# Check the dimension
dim(crimes)
## [1] 77 13
# Check with glimpse
glimpse(crimes)
## Rows: 77
## Columns: 13
## $ COMMUNITY <chr> "ALBANY PARK", "ARCHER HEIGHTS", "ARMOUR SQUARE...
## $ ASSAULT <int> 123, 51, 74, 169, 708, 1198, 118, 135, 337, 63,...
## $ BATTERY <int> 429, 134, 184, 448, 1681, 3347, 280, 350, 850, ...
## $ BURGLARY <int> 147, 92, 55, 194, 339, 517, 76, 145, 327, 64, 1...
## $ CRIMINAL.DAMAGE <int> 287, 114, 99, 379, 859, 1666, 150, 310, 528, 12...
## $ CRIMINAL.TRESPASS <int> 38, 23, 56, 43, 228, 265, 29, 36, 88, 29, 27, 3...
## $ DECEPTIVE.PRACTICE <int> 137, 67, 59, 178, 310, 767, 73, 200, 314, 89, 9...
## $ MOTOR.VEHICLE.THEFT <int> 176, 50, 37, 189, 281, 732, 58, 123, 411, 42, 6...
## $ NARCOTICS <int> 27, 18, 9, 30, 345, 1456, 15, 22, 119, 10, 35, ...
## $ OTHER <int> 107, 37, 48, 114, 584, 1261, 76, 88, 238, 22, 6...
## $ OTHER.OFFENSE <int> 158, 44, 35, 164, 590, 1130, 94, 142, 339, 77, ...
## $ ROBBERY <int> 144, 30, 98, 111, 349, 829, 65, 109, 172, 24, 5...
## $ THEFT <int> 690, 180, 263, 461, 1201, 2137, 239, 669, 846, ...
# Transform into a matrix, without `community`
matrix_crimes <- crimes %>%
select(-COMMUNITY) %>%
as.matrix()
# Check the first values
head(matrix_crimes)
## ASSAULT BATTERY BURGLARY CRIMINAL.DAMAGE CRIMINAL.TRESPASS
## [1,] 123 429 147 287 38
## [2,] 51 134 92 114 23
## [3,] 74 184 55 99 56
## [4,] 169 448 194 379 43
## [5,] 708 1681 339 859 228
## [6,] 1198 3347 517 1666 265
## DECEPTIVE.PRACTICE MOTOR.VEHICLE.THEFT NARCOTICS OTHER OTHER.OFFENSE
## [1,] 137 176 27 107 158
## [2,] 67 50 18 37 44
## [3,] 59 37 9 48 35
## [4,] 178 189 30 114 164
## [5,] 310 281 345 584 590
## [6,] 767 732 1456 1261 1130
## ROBBERY THEFT
## [1,] 144 690
## [2,] 30 180
## [3,] 98 263
## [4,] 111 461
## [5,] 349 1201
## [6,] 829 2137
set.seed(2017)
# Fit the Poisson mixture model
poisson_mm <- stepFlexmix(matrix_crimes ~ 1,
k = 1:15,
nrep = 5,
model = FLXMCmvpois(),
control = list(tolerance = 1e-15, iter.max = 1000))
## 1 : * * * * *
## 2 : * * * * *
## 3 : * * * * *
## 4 : * * * * *
## 5 : * * * * *
## 6 : * * * * *
## 7 : * * * * *
## 8 : * * * * *
## 9 : * * * * *
## 10 : * * * * *
## 11 : * * * * *
## 12 : * * * * *
## 13 : * * * * *
## 14 : * * * * *
## 15 : * * * * *
# Select the model that minimize the BIC
best_poisson_mm <- getModel(poisson_mm, which = "BIC")
# Get the parameters into a data frame
params_lambdas <- data.frame(parameters(best_poisson_mm))
# Add the column with the type of crime
params_lambdas_crime <- params_lambdas %>%
mutate(crime = colnames(matrix_crimes))
# Plot the clusters with their lambdas
params_lambdas_crime %>%
gather(cluster, lambdas, -crime) %>%
ggplot(aes(x = crime, y = lambdas, fill = crime)) +
geom_bar(stat = "identity") +
facet_wrap(~ cluster) +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "none")
# Add the cluster assignments
crimes_with_clusters <- crimes %>%
mutate(cluster = factor(clusters(best_poisson_mm)))
# Enumerate the cluster's elements
crimes_with_clusters <- crimes_with_clusters %>%
group_by(cluster) %>%
mutate(number = row_number())
# Plot the clusters with the communities
crimes_with_clusters %>%
ggplot(aes(x = cluster, y = number, col = cluster)) +
geom_text(aes(label = COMMUNITY), size = 2.3)+
theme(legend.position="none")