1 Introduction

TBD

2 Tables

2.1 T-test (Normality Test of Distribution)

2.1.1 Memory

## 
##  One Sample t-test
## 
## data:  reduced3VariablesBySum$memory
## t = 25.302, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.07107381 0.08301051
## sample estimates:
##  mean of x 
## 0.07704216
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMean$memory
## t = 68.779, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.01517020 0.01606024
## sample estimates:
##  mean of x 
## 0.01561522
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMode$memory
## t = 68.232, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.01520301 0.01610233
## sample estimates:
##  mean of x 
## 0.01565267
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMedian$memory
## t = 68.467, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.01517661 0.01607121
## sample estimates:
##  mean of x 
## 0.01562391

2.1.2 Cpu

## 
##  One Sample t-test
## 
## data:  reduced3VariablesBySum$cpu
## t = 34.735, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.2496685 0.2795324
## sample estimates:
## mean of x 
## 0.2646005
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMean$cpu
## t = 86.821, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.06781539 0.07094824
## sample estimates:
##  mean of x 
## 0.06938182
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMode$cpu
## t = 84.797, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.06749097 0.07068502
## sample estimates:
## mean of x 
##  0.069088
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMedian$cpu
## t = 86.264, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.06710286 0.07022326
## sample estimates:
##  mean of x 
## 0.06866306

2.1.3 Disk

## 
##  One Sample t-test
## 
## data:  reduced3VariablesBySum$disk
## t = 9.1225, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.05613521 0.08686062
## sample estimates:
##  mean of x 
## 0.07149791
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMean$disk
## t = 48.418, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.01100127 0.01192961
## sample estimates:
##  mean of x 
## 0.01146544
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMode$disk
## t = 43.781, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.01088396 0.01190421
## sample estimates:
##  mean of x 
## 0.01139408
## 
##  One Sample t-test
## 
## data:  reduced3VariablesByMedian$disk
## t = 46.282, df = 13303, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.01086872 0.01183006
## sample estimates:
##  mean of x 
## 0.01134939

2.2 Covariance Analysis

memory cpu disk
memory 0.001205 0.001458 0.000186
cpu 0.001458 0.011665 0.001936
disk 0.000186 0.001936 0.004905

2.2.1 Reduced by Sum Data

memory cpu disk
memory 0.123343 0.187899 0.070163
cpu 0.187899 0.772036 0.374515
disk 0.070163 0.374515 0.817227

2.2.2 Data Reduced by Mean

memory cpu disk
memory 0.000686 0.000862 0.000056
cpu 0.000862 0.008496 0.000943
disk 0.000056 0.000943 0.000746

2.2.3 Data Reduced by Mode

memory cpu disk
memory 7.0e-04 0.000860 0.000052
cpu 8.6e-04 0.008831 0.000972
disk 5.2e-05 0.000972 0.000901

2.2.4 Data Reduced by Median

memory cpu disk
memory 0.000693 0.000844 5.1e-05
cpu 0.000844 0.008429 9.4e-04
disk 0.000051 0.000940 8.0e-04

2.3 Distance covariance

Sum 0.0347703877060455
Mean 0.00371654319485833
Mode 0.00379469863386652
Median 0.00370310703282772

3 Plots

3.1 Empirical Cumulative Distribution Function

3.1.1 Memory

3.1.2 Cpu

3.1.3 Disk

3.2 Boxplot

3.2.1 Memory

3.2.2 Cpu

3.2.3 Disk

3.3 Normal QQ plot

3.3.1 Memory

3.3.2 Cpu

3.3.3 Disk

3.4 Histogram

3.4.1 Memory

Sum function. Memory Variable Sum function. Memory Variable

3.4.2 Cpu

Sum function. Memory Variable Sum function. Memory Variable

3.4.3 Disk

Sum function. Memory Variable Sum function. Memory Variable

3.5 Density Plots

3.5.1 Memory

Sum function. Memory Variable

Sum function. Memory Variable

3.5.2 Cpu

Sum function. Memory Variable

Sum function. Memory Variable

3.5.3 Disk

Sum function. Memory Variable

Sum function. Memory Variable

3.6 References

http://www.bibliotecadigital.unicamp.br/document/?code=000114544 https://bibliotecadigital.ipb.pt/bitstream/10198/968/1/tese%20dout%20-%20An%C3%A1lise_e_Compress%C3%A3o_de_Sequ%C3%AAncias_Gen%C3%B3micas.pdf http://www.sbis.org.br/cbis/pdfs/SO%2017-%20Jos%C3%A9%20Raphael%20Marques%20-%20Corvo%20900h.pdf http://tede.biblioteca.ufpb.br:8080/bitstream/tede/6044/1/parte1.pdf

https://cran.r-project.org/web/packages/entropy/entropy.pdf https://stat.ethz.ch/pipermail/r-help/2008-July/167112.html http://www.stat.cmu.edu/~cshalizi/350/lectures/05/lecture-05.R

Referencias dia 31/08/2015 https://stat.ethz.ch/pipermail/r-help/2010-October/256017.html http://www.eecs.tufts.edu/~dsculley/papers/compressionAndVectors.pdf http://svitsrv25.epfl.ch/R-doc/library/pcaPP/html/plotcov.html http://www.statmethods.net/advstats/factor.html

Referencias dia 02/09/2015 http://stats.stackexchange.com/questions/105611/dealing-with-non-normal-distribution-in-big-datasets-when-do-we-throw-out-the

http://www.biostathandbook.com/normality.html

https://books.google.com.br/books?id=1_JGAwAAQBAJ&pg=PA410&lpg=PA410&dq=non+normal+big+data&source=bl&ots=U5H1htNs_n&sig=tDYd0ZVKCmkWyutkdQpaxO3XsQg&hl=pt-BR&sa=X&ved=0CHYQ6AEwCGoVChMI0b7eh9DZxwIVwwqQCh3EFAZs#v=onepage&q=non%20normal%20big%20data&f=false

http://127.0.0.1:23572/library/nortest/html/ad.test.html

http://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-distributed-in-r

http://stats.stackexchange.com/questions/31098/how-to-ensure-data-is-normally-distributed-for-the-purpose-of-performing-a-conti

http://pt.scribd.com/doc/101964554/O-que-fazer-quando-a-distribuicao-nao-e-normal#scribd

http://www.portalaction.com.br/analise-de-capacidade/analise-de-performance-do-processo-para-dados-nao-normais

https://cran.r-project.org/web/packages/fpow/index.html