data = read.csv('data.csv')
summary(data)
## Point_Id x_.1 x_.2 x_.3
## Min. : 1 Min. :-1.8935 Min. :-29.046579 Min. :-3.222667
## 1st Qu.:1251 1st Qu.:-0.5559 1st Qu.: -3.261161 1st Qu.:-0.485922
## Median :2500 Median : 0.1813 Median : -0.047756 Median :-0.011177
## Mean :2500 Mean : 0.1015 Mean : 0.007917 Mean :-0.001812
## 3rd Qu.:3750 3rd Qu.: 0.7361 3rd Qu.: 3.271308 3rd Qu.: 0.468251
## Max. :5000 Max. : 1.9109 Max. : 22.978962 Max. : 2.679311
## x_.4 x_.5 x_.6
## Min. :-6.0800 Min. :-3.596550 Min. :-4.196409
## 1st Qu.:-3.0414 1st Qu.:-0.660218 1st Qu.:-0.676217
## Median :-1.2134 Median :-0.004962 Median :-0.032714
## Mean : 0.4409 Mean : 0.006851 Mean :-0.008161
## 3rd Qu.: 3.9946 3rd Qu.: 0.694779 3rd Qu.: 0.669347
## Max. : 7.5189 Max. : 4.023667 Max. : 3.327201
## x_.7 x_.8 x_.9
## Min. :-8.21818 Min. :-3.333501 Min. :-3.084774
## 1st Qu.:-5.02061 1st Qu.:-0.588773 1st Qu.:-0.608806
## Median :-3.21870 Median :-0.003886 Median : 0.028685
## Mean :-0.06125 Mean : 0.005829 Mean : 0.009755
## 3rd Qu.: 4.97732 3rd Qu.: 0.627532 3rd Qu.: 0.625041
## Max. : 7.76021 Max. : 3.554115 Max. : 3.783108
## x_10 x_11 x_12 x_13
## Min. :0.000 Min. :-24.11570 Min. :-1.91071 Min. :0.0000
## 1st Qu.:1.000 1st Qu.: -0.59061 1st Qu.:-0.66730 1st Qu.:0.0000
## Median :1.000 Median : 0.01058 Median :-0.04499 Median :0.0000
## Mean :0.908 Mean : 0.01091 Mean :-0.01382 Mean :0.4926
## 3rd Qu.:1.000 3rd Qu.: 0.63721 3rd Qu.: 0.64779 3rd Qu.:1.0000
## Max. :1.000 Max. : 12.78021 Max. : 2.18555 Max. :1.0000
## x_14
## Min. :0.0000
## 1st Qu.:1.0000
## Median :1.0000
## Mean :0.7512
## 3rd Qu.:1.0000
## Max. :1.0000
library(GGally)
ggpairs(data)
It looks like Point_Id is just a noise variable. All of the marginal density estimates appear to be sums of Gaussians and pairwise plots all seem to show correlated Gaussians, suggesting that a mixture of Gaussians density model will perform well.
From the pairwise plots, it looks like using \(K=2\) should model the data quite well.