Lesson 1: Measures of Central Tendency, Dispersion and Association

Variable Variance
Calcium 157829.443870502
Iron 35.810535582296
Protein 934.87688135283
Vitamin A 2668452.37064258
Vitamin C 5416.26407815779
Total 2832668.76600817
Example: Nutrient Intake Data - Generalized variance
calcium iron protein vitamin A vitamin C
calcium 157829.4439 940.08944 6075.8163 102411.127 6701.6160 2.831042e+19
iron 940.0894 35.81054 114.0580 2383.153 137.6720 2.831042e+19
protein 6075.8163 114.05803 934.8769 7330.052 477.1998 2.831042e+19
vitamin A 102411.1266 2383.15341 7330.0515 2668452.371 22063.2486 2.831042e+19
vitamin C 6701.6160 137.67199 477.1998 22063.249 5416.2641 2.831042e+19

Lesson 2: Linear Combinations of Random Variables

Variable Mean
Calcium 624.04925
Iron 11.12990
Protein 65.80344
Vitamin A 839.63535
Vitamin C 78.92845
[1] "Estimated mean intake of the two vitamins:"
[1] 79.76808
calcium iron protein vitamin A vitamin C
calcium 157829.4439 940.08944 6075.8163 102411.127 6701.6160
iron 940.0894 35.81054 114.0580 2383.153 137.6720
protein 6075.8163 114.05803 934.8769 7330.052 477.1998
vitamin A 102411.1266 2383.15341 7330.0515 2668452.371 22063.2486
vitamin C 6701.6160 137.67199 477.1998 22063.249 5416.2641
[1] "The sample variance of Y is:"
[1] 5463.059
calcium iron protein vitamin A vitamin C
calcium 157829.4439 940.08944 6075.8163 102411.127 6701.6160
iron 940.0894 35.81054 114.0580 2383.153 137.6720
protein 6075.8163 114.05803 934.8769 7330.052 477.1998
vitamin A 102411.1266 2383.15341 7330.0515 2668452.371 22063.2486
vitamin C 6701.6160 137.67199 477.1998 22063.249 5416.2641

Note, \(Y_1\) is the total intake of vitamins A and C and \(Y_2\) is the total intake of calcium and iron.

[1] "The sample covariance between Y_1 and Y_2:"
[1] 6944.082
[1] "The sample variance of Y_2 is:"
[1] 159745.4
[1] "The sample correlation between Y_1 and Y_2 is:"
[1] 0.2350621

Lesson 3: Graphical Display of Multivariate Data

Graphical Method

  • Univariate Cases (Histogram)

  • Bivariate Cases (Scatter Plots)

Correlation Matrix (Quarter Root Transformations)
S_S_calc S_S_iron S_S_prot S_S_a S_S_c
S_S_calc 1.0000000 0.5035233 0.5719948 0.5159994 0.3543776
S_S_iron 0.5035233 1.0000000 0.7355742 0.5028902 0.4214321
S_S_prot 0.5719948 0.7355742 1.0000000 0.4271488 0.3334733
S_S_a 0.5159994 0.5028902 0.4271488 1.0000000 0.4313927
S_S_c 0.3543776 0.4214321 0.3334733 0.4313927 1.0000000
  • Trivariate Cases (Rotating Scatter Plots )

  • Multivariate Cases (Matrix of Scatter Plots)

Lesson 4: Multivariate Normal Distribution

Variable Mean
Information 12.567568
Similarities 9.567568
Arithmetic 11.486486
Picture Completion 7.972973
info sim arith pict
info 11.474474 9.0855856 6.382883 2.0713213
sim 9.085586 12.0855856 5.938438 0.5435435
arith 6.382883 5.9384384 11.090090 1.7912913
pict 2.071321 0.5435435 1.791291 3.6936937

The eigenvalues and corresponding eigenvectors are given below:

Eigenvalue
26.245278
6.255366
3.931553
1.911647
X1 X2 X3 X4
info -0.6057467 -0.6047618 -0.5051337 -0.1103360
sim 0.2176473 0.4958117 -0.7946452 -0.2744802
arith 0.4605028 -0.3196759 -0.3349263 0.7573433
pict 0.6112591 -0.5350152 0.0346888 -0.5821664

Lesson 6: Multivariate Conditional Distribution and Partial Correlation

Example 6-2: Wechsler Adult Intelligence Data
info sim arith pict
info 1.0000000 0.7118787 0.2267109 0.3463270
sim 0.7118787 1.0000000 0.1890167 -0.2962762
arith 0.2267109 0.1890167 1.0000000 0.1758075
pict 0.3463270 -0.2962762 0.1758075 1.0000000

The partial correlation between Information and Similarities is r=0.7118787.

info sim arith pict
info Inf 1.0240963 0.6413618 0.3296027
sim 1.0240963 Inf 0.5667183 0.0815325
arith 0.6413618 0.5667183 Inf 0.2875493
pict 0.3296027 0.0815325 0.2875493 Inf
t
Fisher Transform (z) 0.890982564683951
95% CI for z ( 0.5445 , 1.2375 )
95% CI for r ( 0.4964 , 0.8447 )

Based on the result, we can conclude that we are 95% confident that the interval (0.4964, 0.8447) contains the partial correlation between Information and Similarities scores given scores on Arithmetic and Picture Completion.

Lesson 7: Inferences Regarding Multivariate Population Mean

The sample variance-covariance matrix:
calcium iron protein vitamin A vitamin C
calcium 157829.4439 940.08944 6075.8163 102411.127 6701.6160
iron 940.0894 35.81054 114.0580 2383.153 137.6720
protein 6075.8163 114.05803 934.8769 7330.052 477.1998
vitamin A 102411.1266 2383.15341 7330.0515 2668452.371 22063.2486
vitamin C 6701.6160 137.67199 477.1998 22063.249 5416.2641
Statistic Value
T^2 1758.5413
F 349.7968
df1 5.0000
df2 732.0000
p 0.0000

The one-at-a-time confidence intervals are given by “loone” and “upone”. On the other hand, simultaneous confidence intervals are given by the columns for “losim” and “upsim” while for Bonferroni intervals are given by “lobon” and “upbon”.

Variable n means var t1 tb f loone upone lobon upbon losim upsim
calcium 737 624.04925 1.578294e+05 1.963192 2.582526 2.22634 595.32008 652.77843 586.25681 661.84169 575.09118 673.00733
iron 737 11.12990 3.581054e+01 1.963192 2.582526 2.22634 10.69715 11.56265 10.56063 11.69917 10.39244 11.86735
protein 737 65.80344 9.348769e+02 1.963192 2.582526 2.22634 63.59235 68.01453 62.89481 68.71207 62.03547 69.57141
vitamin A 737 839.63535 2.668452e+06 1.963192 2.582526 2.22634 721.50572 957.76498 684.23906 995.03163 638.32780 1040.94289
vitamin C 737 78.92845 5.416264e+03 1.963192 2.582526 2.22634 73.60640 84.25050 71.92743 85.92946 69.85901 87.99788
            d1          d2         d3          d4
d1  0.82298851  0.07816092 -0.0137931 -0.05977011
d2  0.07816092  0.80919540 -0.2137931 -0.15632184
d3 -0.01379310 -0.21379310  0.5620690  0.51034483
d4 -0.05977011 -0.15632184  0.5103448  0.60229885
Test Statistics
Test Value
T^2 13.1278403
F 2.9424470
df1 4.0000000
df2 26.0000000
p 0.0393691
Simultaneous_L Simulatneous_U Bonferroni_L Bonferroni_U
d1 -0.5127078 0.6460411 -0.3744357 0.5077690
d2 -0.7078322 0.4411655 -0.5707236 0.3040570
d3 -0.7788034 0.1788034 -0.6645333 0.0645333
d4 -0.6289758 0.3623091 -0.5106869 0.2440202

Mean values for real and fake groups
Group Length Left Right Bottom Top Diag
Genuine 214.969 129.943 129.720 8.305 10.168 141.517
Counterfeit 214.823 130.300 130.193 10.530 11.133 139.450
Variance-covariance Matrix for Genuine Notes
length left right bottom top diag
length 0.1502414 0.0580131 0.0572929 0.0571263 0.0144525 0.0054818
left 0.0580131 0.1325768 0.0858990 0.0566515 0.0490667 -0.0430616
right 0.0572929 0.0858990 0.1262626 0.0581818 0.0306465 -0.0237778
bottom 0.0571263 0.0566515 0.0581818 0.4132071 -0.2634747 -0.0001869
top 0.0144525 0.0490667 0.0306465 -0.2634747 0.4211879 -0.0753091
diag 0.0054818 -0.0430616 -0.0237778 -0.0001869 -0.0753091 0.1998091
Variance-covariance Matrix for Counterfeit Notes
length left right bottom top diag
length 0.1240111 0.0315152 0.0240010 -0.1005960 0.0194354 0.0115657
left 0.0315152 0.0650505 0.0467677 -0.0240404 -0.0119192 -0.0050505
right 0.0240010 0.0467677 0.0889404 -0.0185758 0.0001323 0.0341919
bottom -0.1005960 -0.0240404 -0.0185758 1.2813131 -0.4901919 0.2384848
top 0.0194354 -0.0119192 0.0001323 -0.4901919 0.4044556 -0.0220707
diag 0.0115657 -0.0050505 0.0341919 0.2384848 -0.0220707 0.3112121
Pooled Variance-Covariance Matrix
length left right bottom top diag
length 0.1371263 0.0447641 0.0406470 -0.0217348 0.0169439 0.0085237
left 0.0447641 0.0988136 0.0663333 0.0163056 0.0185737 -0.0240561
right 0.0406470 0.0663333 0.1076015 0.0198030 0.0153894 0.0052071
bottom -0.0217348 0.0163056 0.0198030 0.8472601 -0.3768333 0.1191490
top 0.0169439 0.0185737 0.0153894 -0.3768333 0.4128217 -0.0486899
diag 0.0085237 -0.0240561 0.0052071 0.1191490 -0.0486899 0.2555106
              [,1]
diff1  0.117944052
diff2  0.354730710
diff3 -0.047179834
diff4  0.002835103
            diff1       diff2       diff3       diff4
diff1  0.19164212 -0.07101776  0.04511467 -0.03756199
diff2 -0.07101776  0.16538367 -0.17884359  0.02955601
diff3  0.04511467 -0.17884359  4.12372604 -3.75507101
diff4 -0.03756199  0.02955601 -3.75507101  4.39690660
Statistic Value
T^2 1030.795
F 256.648
df1 4.000
df2 733.000
p 0.000
Confidence Intervals - Swiss Bank Notes
variable n1 xbar1 s21 n2 xbar2 s22 sp losim upsim lobon upbon
bottom 100 8.305 0.4132071 100 10.530 1.2813131 0.8472601 -2.6980943 -1.7519057 -2.5719164 -1.8780836
diag 100 141.517 0.1998091 100 139.450 0.3112121 0.2555106 1.8071972 2.3268028 1.8764886 2.2575114
left 100 129.943 0.1325768 100 130.300 0.0650505 0.0988136 -0.5185652 -0.1954348 -0.4754745 -0.2385255
length 100 214.969 0.1502414 100 214.823 0.1240111 0.1371263 -0.0443267 0.3363267 0.0064349 0.2855651
right 100 129.720 0.1262626 100 130.193 0.0889404 0.1076015 -0.6415965 -0.3044035 -0.5966305 -0.3493695
top 100 10.168 0.4211879 100 11.133 0.4044556 0.4128217 -1.2952331 -0.6347669 -1.2071574 -0.7228426
Simultaneous 95% Confidence Intervals - Swiss Bank Notes
variable losim upsim
bottom -2.6980943 -1.7519057
diag 1.8071972 2.3268028
left -0.5185652 -0.1954348
length -0.0443267 0.3363267
right -0.6415965 -0.3044035
top -1.2952331 -0.6347669

Warning: package 'biotools' was built under R version 4.3.3

    Box's M-test for Homogeneity of Covariance Matrices

data:  swiss[, -1]
Chi-Sq (approx.) = 121.9, df = 21, p-value = 3.198e-16
Statistic Value
T^2 2412.451
F 391.922
df1 6.000
df2 193.000
p 0.000
Simultaneous (1 - α) x 100% Confidence Intervals
variable losim_adj upsim_adj
bottom -2.6868875 -1.7631125
diag 1.8133515 2.3206485
left -0.5147380 -0.1992620
length -0.0398182 0.3318182
right -0.6376027 -0.3083973
top -1.2874105 -0.6425895

Lesson 8: Multivariate Analysis of Variance (MANOVA)

  site       ral          rfe        rmg          rca         rna
1    L  1.835714  0.627857143 -0.5264286 -0.052142857  0.25928571
2    L  1.235714  0.707857143 -1.3964286 -0.082142857 -0.08071429
3    L  2.035714  0.717857143 -0.9464286 -0.072142857 -0.05071429
4    L -1.064286 -0.002142857  0.8135714 -0.042142857 -0.11071429
5    L  1.235714  0.687857143  0.5135714 -0.002142857 -0.05071429
6    L -1.664286 -0.112142857 -1.3564286 -0.032142857 -0.03071429

[1] 0.7118787
Test statistic: -2.131628e-14 
Degrees of freedom: 45 
Critical value ( 0.95 ): 61.65623 
Fail to reject the null hypothesis: Covariance matrices are homogeneous.
          Df    Wilks approx F num Df den Df   Pr(>F)    
site       3 0.012301   13.088     15 50.091 1.84e-12 ***
Residuals 22                                             
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response 1 :
            Df  Sum Sq Mean Sq F value    Pr(>F)    
site         3 175.610  58.537  26.669 1.627e-07 ***
Residuals   22  48.288   2.195                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response 2 :
            Df  Sum Sq Mean Sq F value    Pr(>F)    
site         3 134.222  44.741  89.883 1.679e-12 ***
Residuals   22  10.951   0.498                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response 3 :
            Df Sum Sq Mean Sq F value    Pr(>F)    
site         3 103.35  34.450   49.12 6.452e-10 ***
Residuals   22  15.43   0.701                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response 4 :
            Df   Sum Sq  Mean Sq F value    Pr(>F)    
site         3 0.204703 0.068234  29.157 7.546e-08 ***
Residuals   22 0.051486 0.002340                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Response 5 :
            Df  Sum Sq  Mean Sq F value    Pr(>F)    
site         3 0.25825 0.086082  9.5026 0.0003209 ***
Residuals   22 0.19929 0.009059                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
        C  L  A   I
C+L-A-I 8 -2  8 -14
A_vs_I  1  0 -1   0
C_vs_L  0  1  0  -1

Call:
lm(formula = outcome ~ site, data = pottery)

Coefficients:
             [,1]     [,2]     [,3]     [,4]     [,5]   
(Intercept)  17.3200   1.5120   0.6060   0.0520   0.0480
siteC        -5.6200   3.9030   3.2490   0.2430   0.0020
siteI         0.8600   0.2000   0.0680  -0.0260   0.0060
siteL        -4.7557   4.8601   4.2204   0.1501   0.2027