Multivariate Analysis Notes

Patrick Oster, Dr. Xaoli Kong

Spring Semester, 2019

Intro

Basic Sample Statistics

Mean; Var-Cov; Correlation Coefficient Matrix

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length   0.68112222 -0.04215111    1.2658200   0.5128289
## Sepal.Width   -0.04215111  0.18871289   -0.3274587  -0.1208284
## Petal.Length   1.26582000 -0.32745867    3.0955027   1.2869720
## Petal.Width    0.51282889 -0.12082844    1.2869720   0.5771329
##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    0.6856935  -0.0424340    1.2743154   0.5162707
## Sepal.Width    -0.0424340   0.1899794   -0.3296564  -0.1216394
## Petal.Length    1.2743154  -0.3296564    3.1162779   1.2956094
## Petal.Width     0.5162707  -0.1216394    1.2956094   0.5810063

L1: Matrix

L2: Random Vectors & Moments

L3: Multivariate Normal Distribution

Stiff Data

Testing Normality

Univariate Normality

V1 V2 V3 V4
1889 1651 1561 1778
2403 2048 2087 2197
2119 1700 1815 2222
1645 1627 1110 1533
1976 1916 1614 1883
1712 1712 1439 1546

## 
##  Shapiro-Wilk normality test
## 
## data:  x1
## W = 0.93068, p-value = 0.05118
## 
##  Shapiro-Wilk normality test
## 
## data:  x2
## W = 0.91274, p-value = 0.01746
## 
##  Shapiro-Wilk normality test
## 
## data:  x3
## W = 0.93258, p-value = 0.05751
## 
##  Shapiro-Wilk normality test
## 
## data:  x4
## W = 0.96127, p-value = 0.3337

Bivariate Normality

## [1] 0.6
Chi-Square Plot
## [1] 3

## $multivariateNormality
##      Test        H     p value MVN
## 1 Royston 9.823858 0.009534665  NO
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk    x1        0.9307    0.0512    YES   
## 2 Shapiro-Wilk    x2        0.9127    0.0175    NO    
## 3 Shapiro-Wilk    x3        0.9326    0.0575    YES   
## 4 Shapiro-Wilk    x4        0.9613    0.3337    YES   
## 
## $Descriptives
##     n     Mean  Std.Dev Median  Min  Max    25th    75th      Skew
## x1 30 1906.100 324.9866 1863.0 1325 2983 1715.25 2057.25 1.0380842
## x2 30 1749.533 318.6065 1680.0 1170 2794 1595.50 1888.75 1.1435912
## x3 30 1509.133 303.1783 1466.0 1002 2412 1295.75 1623.75 0.9800274
## x4 30 1724.967 322.8436 1674.5 1176 2581 1520.25 1880.75 0.5978431
##       Kurtosis
## x1  2.03586397
## x2  1.94986381
## x3  0.99683699
## x4 -0.04626509

Detecting Outliers

##      scale.x1.  scale.x2.   scale.x3.   scale.x4.
## 1  -0.05261755 -0.3092634  0.17107644  0.16426945
## 2   1.52898605  0.9367877  1.90602908  1.46211166
## 3   0.65510390 -0.1554687  1.00886726  1.53954854
## 4  -0.80341770 -0.3845914 -1.31649701 -0.59461204
## 5   0.21508578  0.5224835  0.34589106  0.48950437
## 6  -0.59725537 -0.1178047 -0.23132702 -0.55434486
## 7   0.11354314 -0.2025487 -0.78545638 -0.16716042
## 8   0.60894816  0.2211714  0.68562513  0.46162709
## 9   3.31367493  3.2782337  2.97800552  2.65154223
## 10 -0.49571272 -0.4693354 -0.41273841 -0.67204892
## 11 -0.60340947 -0.4975834  0.02924572 -0.17955033
## 12  0.43047927  0.4942355  0.38877012  0.53596650
## 13 -0.20339299  0.2870835  0.28322167  0.04966286
## 14 -0.12031265 -0.2025487 -0.05321401 -0.14547810
## 15 -0.14492905 -0.3155407 -0.39624647 -0.03396898
## 16  0.14739069  1.2537931 -1.08560978 -1.37517585
## 17 -1.78807364 -1.8189625 -1.67272303 -1.70041077
## 18 -1.49883096 -1.1880903 -0.84812577 -1.29154401
## 19 -0.24031759 -0.3626207  0.30631040  0.09302751
## 20 -0.55725372 -0.4881674 -0.64692404 -0.24459731
## 21  1.13820072  1.3793398  0.12489900  1.19572877
## 22 -0.02184705 -0.4253941 -0.28739963 -0.76807066
## 23 -0.84034230 -0.7423995 -0.72278698 -0.64726912
## 24  0.47663501  0.3686888  0.45143951  0.96651559
## 25 -0.15416020 -0.8051729 -0.50509331 -0.59461204
## 26 -0.55109962 -1.0594050 -0.89430321 -0.79285046
## 27  0.80587934  0.4597102  0.63285091  0.33772807
## 28 -0.77264721 -0.2339354 -0.31378674 -0.39637361
## 29  1.29205321  1.7308706  1.83346452  1.57671825
## 30 -1.28036042 -1.1535650 -0.97346455 -1.36588342
##  [1]  3 25 27 24  8 12 22 10 29  4 11  2 16  1  6 30 19 20  7  9 28 23  5
## [24] 14 21 18 13 17 26 15
##              m_dist
##  [1,] 14  0.1295714
##  [2,] 12  0.4635159
##  [3,]  1  0.6000129
##  [4,] 10  0.7665400
##  [5,] 23  0.7962096
##  [6,] 15  1.0792484
##  [7,] 19  1.3632124
##  [8,]  5  1.3980776
##  [9,] 20  1.4649908
## [10,]  8  1.4876570
## [11,] 11  1.9307771
## [12,]  6  2.2191409
## [13,] 27  2.3816050
## [14,] 24  2.5385575
## [15,] 30  2.5838186
## [16,] 13  2.6959024
## [17,] 28  2.9951752
## [18,] 26  3.3979804
## [19,] 17  3.5018290
## [20,] 18  3.9900603
## [21,] 25  4.5767867
## [22,]  7  4.9883498
## [23,] 22  5.0557446
## [24,]  4  5.2076098
## [25,]  2  5.4770196
## [26,] 29  6.2837628
## [27,]  3  7.6166439
## [28,] 21  9.8980384
## [29,]  9 12.2647550
## [30,] 16 16.8474070

## $multivariateNormality
##      Test        H   p value MVN
## 1 Royston 1.098338 0.6271166 YES
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk    x1        0.9847    0.9439    YES   
## 2 Shapiro-Wilk    x2        0.9664    0.4871    YES   
## 3 Shapiro-Wilk    x3        0.9672    0.5070    YES   
## 4 Shapiro-Wilk    x4        0.9660    0.4779    YES   
## 
## $Descriptives
##     n     Mean  Std.Dev Median  Min  Max    25th    75th       Skew
## x1 28 1865.929 262.1619 1857.5 1325 2403 1711.50 2049.75 0.08994538
## x2 28 1697.964 244.8618 1663.0 1170 2301 1593.25 1847.50 0.39091767
## x3 28 1488.643 253.1536 1466.0 1002 2087 1307.25 1617.25 0.49661284
## x4 28 1710.250 277.9986 1674.5 1176 2234 1528.75 1876.25 0.25921958
##      Kurtosis
## x1 -0.5084972
## x2  0.1961808
## x3  0.0516768
## x4 -0.6484407

Transformations

## yjPower Transformations to Multinormality 
##    Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## x1    0.0939           1      -0.9911       1.1790
## x2   -0.2802           0      -1.5129       0.9525
## x3    0.1478           1      -0.9585       1.2542
## x4    0.7546           1      -0.5292       2.0385
## 
##  Likelihood ratio test that all transformation parameters are equal to 0
##                                  LRT df    pval
## LR test, lambda = (0 0 0 0) 1.980807  4 0.73929

## $multivariateNormality
##      Test        H   p value MVN
## 1 Royston 1.652091 0.5200565 YES
## 
## $univariateNormality
##           Test  Variable Statistic   p value Normality
## 1 Shapiro-Wilk    x1        0.9742    0.6577    YES   
## 2 Shapiro-Wilk    x2        0.9626    0.3612    YES   
## 3 Shapiro-Wilk    x3        0.9795    0.8118    YES   
## 4 Shapiro-Wilk    x4        0.9831    0.9005    YES   
## 
## $Descriptives
##     n     Mean   Std.Dev   Median      Min      Max     25th     75th
## x1 30 7.539650 0.1631929 7.529941 7.189168 8.000685 7.447309 7.629120
## x2 30 7.452334 0.1721702 7.426545 7.064759 7.935230 7.374941 7.543648
## x3 30 7.301165 0.1910199 7.290123 6.909753 7.788212 7.166816 7.392488
## x4 30 7.436553 0.1834145 7.423268 7.069874 7.855932 7.326618 7.539424
##         Skew    Kurtosis
## x1 0.3660786  0.69728520
## x2 0.4814651  0.73170914
## x3 0.4107699  0.07436358
## x4 0.1581748 -0.44895116

L4: Inference for Mean Vectors

Sweat Data

Comparing Perspiration Between 2 Groups

Sample Statistics Sweat

We want to see if perspiration in Latin American healthy women in the age group 50-65 is the same as their counterparts in the North America. Perspiration is quantified in terms of the following variables
* X1: Sweat Rate
* X2: Sodium in Sweat
* X3: Potassium in Sweat

V1 V2 V3
3.7 48.5 9.3
5.7 65.1 8.0
3.8 47.2 10.9
3.2 53.2 12.0
3.1 55.5 9.7
4.6 36.1 7.9

Waste Water Data

Municipal wastewater treatment plants are required by law to monitor their discharges into rivers and streams on a regular basis. Concern about the reliability of data from one of these self-monitoring program led to a study in which samples of effluent were divided and sent to two laboratories.

Compare two lab’s measurement by two variables:
* X1 & X3: Biochemical oxygen demand (BOD) from two labs
* X2 & X4: Suspended solids (SS) from two labs

Alternate Method Using Matrix

stat crit reject p
13.63931 9.458877 TRUE 0.0208278

Confidence Region/Interval for Delta

## $`One-at-a-time Intervals`
##           lower     upper
## BOD -18.8467298  0.119457
## SS   -0.4725958 27.018050
## 
## $`T^2 Intervals`
##          lower    upper
## BOD -22.453272  3.72600
## SS   -5.700119 32.24557
## 
## $`Bonferroni Simultaneous Intervals`
##          lower     upper
## BOD -20.573107  1.845835
## SS   -2.974903 29.520358

Turtles Dataset

Two Sample Independent Tests

Measurements on the carapaces of 24 female and 24 male painted turtles:
* X1: Length
* X2: Width
* X3: Height
* X4: Sex (female and male)

2-Sample Independent Confidence Intervals
##            [,1]        [,2]
## [1,] -0.2926638 -0.05776762
## [2,] -0.2365537 -0.05411666
## [3,] -0.3451377 -0.12906223
##            [,1]        [,2]
## [1,] -0.2734025 -0.07702893
## [2,] -0.2215940 -0.06907636
## [3,] -0.3274197 -0.14678026
##             Length       Width      Height
## Length 0.011072004 0.008019142 0.008159648
## Width  0.008019142 0.006416726 0.006005271
## Height 0.008159648 0.006005271 0.006772758
##            Length      Width     Height
## Length 0.02640563 0.02011195 0.02491758
## Width  0.02011195 0.01619045 0.01942430
## Height 0.02491758 0.01942430 0.02493980

Iris Data

One-Way MANOVA Manually

V1 V2 V3 V4 V5
5.1 3.5 1.4 0.2 1
4.9 3.0 1.4 0.2 1
4.7 3.2 1.3 0.2 1
4.6 3.1 1.5 0.2 1
5.0 3.6 1.4 0.2 1
5.4 3.9 1.7 0.4 1
## [1] 1 2 3
## [1] 3
## G
##  1  2  3 
## 50 50 50

## [1] 2
## [1] 147
## [1] 299.936
## [1] 2.402562

One-Way Manova Function

##                  Df    Wilks approx F num Df den Df    Pr(>F)    
## as.character(G)   2 0.038316   299.94      4    292 < 2.2e-16 ***
## Residuals       147                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  Response 1 :
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## as.character(G)   2 11.345  5.6725   49.16 < 2.2e-16 ***
## Residuals       147 16.962  0.1154                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response 2 :
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## as.character(G)   2 80.413  40.207  960.01 < 2.2e-16 ***
## Residuals       147  6.157   0.042                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plastic Data

Factor 1: Change in the rate of extrusion 0 for Low 10% 1 for High 10%
Factor 2: Amount of additive 0 for Low 1% 1 for High 1.5%
* X1: Tear resistance
* X2: Gloss
* X3: Opacity

Two-Way MANOVA

V1 V2 V3 V4 V5
0 0 6.5 9.5 4.4
0 0 6.2 9.9 6.4
0 0 5.8 9.6 3.0
0 0 6.5 9.6 4.1
0 0 6.5 9.2 0.8
0 1 6.9 9.1 5.7
##  Response V3 :
##             Df Sum Sq Mean Sq F value   Pr(>F)   
## V1           1 1.7405 1.74050 15.7868 0.001092 **
## V2           1 0.7605 0.76050  6.8980 0.018330 * 
## V1:V2        1 0.0005 0.00050  0.0045 0.947143   
## Residuals   16 1.7640 0.11025                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response V4 :
##             Df Sum Sq Mean Sq F value  Pr(>F)  
## V1           1 1.3005 1.30050  7.9178 0.01248 *
## V2           1 0.6125 0.61250  3.7291 0.07139 .
## V1:V2        1 0.5445 0.54450  3.3151 0.08740 .
## Residuals   16 2.6280 0.16425                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response V5 :
##             Df Sum Sq Mean Sq F value Pr(>F)
## V1           1  0.421  0.4205  0.1036 0.7517
## V2           1  4.901  4.9005  1.2077 0.2881
## V1:V2        1  3.960  3.9605  0.9760 0.3379
## Residuals   16 64.924  4.0578
##           Df   Wilks approx F num Df den Df   Pr(>F)   
## V1         1 0.38186   7.5543      3     14 0.003034 **
## V2         1 0.52303   4.2556      3     14 0.024745 * 
## V1:V2      1 0.77711   1.3385      3     14 0.301782   
## Residuals 16                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Box’s M Test for Equality of Covariance

## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  cbind(plastic$V3, plastic$V4, plastic$V5)
## Chi-Sq (approx.) = 7.1269, df = 6, p-value = 0.3093