MLSALT Density Modeling

data = read.csv('data.csv')
summary(data)

##     Point_Id         x_.1              x_.2                 x_.3          
##  Min.   :   1   Min.   :-1.8935   Min.   :-29.046579   Min.   :-3.222667  
##  1st Qu.:1251   1st Qu.:-0.5559   1st Qu.: -3.261161   1st Qu.:-0.485922  
##  Median :2500   Median : 0.1813   Median : -0.047756   Median :-0.011177  
##  Mean   :2500   Mean   : 0.1015   Mean   :  0.007917   Mean   :-0.001812  
##  3rd Qu.:3750   3rd Qu.: 0.7361   3rd Qu.:  3.271308   3rd Qu.: 0.468251  
##  Max.   :5000   Max.   : 1.9109   Max.   : 22.978962   Max.   : 2.679311  
##       x_.4              x_.5                x_.6          
##  Min.   :-6.0800   Min.   :-3.596550   Min.   :-4.196409  
##  1st Qu.:-3.0414   1st Qu.:-0.660218   1st Qu.:-0.676217  
##  Median :-1.2134   Median :-0.004962   Median :-0.032714  
##  Mean   : 0.4409   Mean   : 0.006851   Mean   :-0.008161  
##  3rd Qu.: 3.9946   3rd Qu.: 0.694779   3rd Qu.: 0.669347  
##  Max.   : 7.5189   Max.   : 4.023667   Max.   : 3.327201  
##       x_.7               x_.8                x_.9          
##  Min.   :-8.21818   Min.   :-3.333501   Min.   :-3.084774  
##  1st Qu.:-5.02061   1st Qu.:-0.588773   1st Qu.:-0.608806  
##  Median :-3.21870   Median :-0.003886   Median : 0.028685  
##  Mean   :-0.06125   Mean   : 0.005829   Mean   : 0.009755  
##  3rd Qu.: 4.97732   3rd Qu.: 0.627532   3rd Qu.: 0.625041  
##  Max.   : 7.76021   Max.   : 3.554115   Max.   : 3.783108  
##       x_10            x_11                x_12               x_13       
##  Min.   :0.000   Min.   :-24.11570   Min.   :-1.91071   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.: -0.59061   1st Qu.:-0.66730   1st Qu.:0.0000  
##  Median :1.000   Median :  0.01058   Median :-0.04499   Median :0.0000  
##  Mean   :0.908   Mean   :  0.01091   Mean   :-0.01382   Mean   :0.4926  
##  3rd Qu.:1.000   3rd Qu.:  0.63721   3rd Qu.: 0.64779   3rd Qu.:1.0000  
##  Max.   :1.000   Max.   : 12.78021   Max.   : 2.18555   Max.   :1.0000  
##       x_14       
##  Min.   :0.0000  
##  1st Qu.:1.0000  
##  Median :1.0000  
##  Mean   :0.7512  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

library(GGally)
ggpairs(data)

It looks like Point_Id is just a noise variable. All of the marginal density estimates appear to be sums of Gaussians and pairwise plots all seem to show correlated Gaussians, suggesting that a mixture of Gaussians density model will perform well.

From the pairwise plots, it looks like using \(K=2\) should model the data quite well.

MLSALT Density Modeling

Feynman Liang

November 16, 2015