This report contains seven parts:
1. Basic Visualisation of data - We have 50 readings on 4 continuous variables and one categorical variable - gender. We have 25 male and 25 female individuals.
2. Checking normality, detecting outliers - Both populations are multivariate normal after removing one outlier in the female set.
3. Confidence intervals and ellipsoids - abc.
4. Profile Analysis - We check the profile of the two populations and test for equality of means, flatness and parallel profiles.
5. Principal Component Analysis - abc.
6. Factor Analysis - We calculate loadings, factor scores and check the assumptions.
7. Discriminant and Classification - We perform LDA and QDA on whole data set and by Lachenbruch’s holdout method. Due to non-equal covariance matrices of the two populations QDA performs better.
We now load required packages before beginning with first part. They are - MVN, MASS, biotools, profileR, psych, car and ggbiplot.
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
## sROC 0.1-2 loaded
## Loading required package: rpanel
## Loading required package: tcltk
## Package `rpanel', version 1.1-4: type help(rpanel) for summary information
## Loading required package: tkrplot
## Loading required package: lattice
## Loading required package: SpatialEpi
## Loading required package: sp
## ---
## biotools version 3.1
##
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
## Loading required package: RColorBrewer
## Loading required package: reshape
## Loading required package: lavaan
## This is lavaan 0.6-6
## lavaan is BETA software! Please report any bugs.
##
## Attaching package: 'lavaan'
## The following object is masked from 'package:psych':
##
## cor2cov
We load the data and present a few of them.
## Resting (L/min) Resting (ml/kg/min) Max (L/min) Max (ml/kg/min) Gender
## 1 0.34 3.71 2.87 30.87 male
## 2 0.39 5.08 3.38 43.85 male
## 3 0.48 5.13 4.13 44.51 male
## 4 0.31 3.95 3.60 46.00 male
## 5 0.36 5.51 3.11 47.02 male
## 6 0.33 4.07 3.95 48.50 male
We look at the scatter matrix.
## 'data.frame': 50 obs. of 5 variables:
## $ Resting (L/min) : num 0.34 0.39 0.48 0.31 0.36 0.33 0.43 0.48 0.21 0.32 ...
## $ Resting (ml/kg/min): num 3.71 5.08 5.13 3.95 5.51 4.07 4.77 6.69 3.71 4.35 ...
## $ Max (L/min) : num 2.87 3.38 4.13 3.6 3.11 3.95 4.39 3.5 2.82 3.59 ...
## $ Max (ml/kg/min) : num 30.9 43.9 44.5 46 47 ...
## $ Gender : chr "male" "male" "male" "male" ...
We look at the density estimates to get an idea of the difference between the two populations.
## [1] "Resting (L/min)"
## [1] 0.34 0.39 0.48 0.31 0.36 0.33 0.43 0.48 0.21 0.32 0.54 0.32 0.40 0.31 0.44
## [16] 0.32 0.50 0.36 0.48 0.40 0.42 0.55 0.50 0.34 0.40 0.29 0.28 0.31 0.30 0.28
## [31] 0.11 0.25 0.26 0.39 0.37 0.31 0.35 0.29 0.33 0.18 0.28 0.44 0.22 0.34 0.30
## [46] 0.31 0.27 0.66 0.37 0.35
## Warning in rug(dataset[26:50, 2], col = "red"): some values will be clipped
To check normality we use the MVN package which has three tests : mardia, hz and royston besides univariate Shapiro Wilk test. We test the two populations separately.
In males we see that the all four variables pass the shapiro-wilk univariate test of normality and passes the mardia and royston multivariate tests but fails the HZ test. Results below.
## $multivariateNormality
## Test HZ p value MVN
## 1 Henze-Zirkler 0.9180322 0.03699581 NO
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.9612 0.4396 YES
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.9672 0.5753 YES
## 3 Shapiro-Wilk Max (L/min) 0.9316 0.0948 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9396 0.1450 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 25 0.3972 0.08438602 0.40 0.21 0.55 0.33 0.48
## Resting (ml/kg/min) 25 5.3296 1.06966303 5.13 3.71 7.89 4.58 6.04
## Max (L/min) 25 3.6876 0.67518689 3.56 2.82 5.23 3.11 4.00
## Max (ml/kg/min) 25 49.4204 7.43317871 48.92 30.87 63.30 46.23 55.08
## Skew Kurtosis
## Resting (L/min) 0.02883229 -0.7707939
## Resting (ml/kg/min) 0.33539338 -0.6496398
## Max (L/min) 0.68995043 -0.5116999
## Max (ml/kg/min) -0.62720924 0.4864554
## $multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 24.7357672005174 0.211727532300499 YES
## 2 Mardia Kurtosis -0.545630313026931 0.585320083024567 YES
## 3 MVN <NA> <NA> YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.9612 0.4396 YES
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.9672 0.5753 YES
## 3 Shapiro-Wilk Max (L/min) 0.9316 0.0948 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9396 0.1450 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 25 0.3972 0.08438602 0.40 0.21 0.55 0.33 0.48
## Resting (ml/kg/min) 25 5.3296 1.06966303 5.13 3.71 7.89 4.58 6.04
## Max (L/min) 25 3.6876 0.67518689 3.56 2.82 5.23 3.11 4.00
## Max (ml/kg/min) 25 49.4204 7.43317871 48.92 30.87 63.30 46.23 55.08
## Skew Kurtosis
## Resting (L/min) 0.02883229 -0.7707939
## Resting (ml/kg/min) 0.33539338 -0.6496398
## Max (L/min) 0.68995043 -0.5116999
## Max (ml/kg/min) -0.62720924 0.4864554
## $multivariateNormality
## Test H p value MVN
## 1 Royston 6.275151 0.1749181 YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.9612 0.4396 YES
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.9672 0.5753 YES
## 3 Shapiro-Wilk Max (L/min) 0.9316 0.0948 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9396 0.1450 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 25 0.3972 0.08438602 0.40 0.21 0.55 0.33 0.48
## Resting (ml/kg/min) 25 5.3296 1.06966303 5.13 3.71 7.89 4.58 6.04
## Max (L/min) 25 3.6876 0.67518689 3.56 2.82 5.23 3.11 4.00
## Max (ml/kg/min) 25 49.4204 7.43317871 48.92 30.87 63.30 46.23 55.08
## Skew Kurtosis
## Resting (L/min) 0.02883229 -0.7707939
## Resting (ml/kg/min) 0.33539338 -0.6496398
## Max (L/min) 0.68995043 -0.5116999
## Max (ml/kg/min) -0.62720924 0.4864554
In females we see that only two variables pass the shapiro-wilk univariate test of normality. It passes mardia and hz multivariate tests but fails the royston test. Results below.
## $multivariateNormality
## Test HZ p value MVN
## 1 Henze-Zirkler 0.8897817 0.0597649 YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.8723 0.0040 NO
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.8401 0.0009 NO
## 3 Shapiro-Wilk Max (L/min) 0.9704 0.6330 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9501 0.2334 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 26 0.3169231 0.09813335 0.305 0.11 0.66 0.2800 0.3500
## Resting (ml/kg/min) 26 5.1557692 1.63805537 5.070 1.74 11.05 4.5550 5.5425
## Max (L/min) 26 2.3346154 0.35424546 2.320 1.71 3.06 2.0475 2.5075
## Max (ml/kg/min) 26 37.9365385 4.85478975 38.085 28.97 51.80 35.2575 39.4300
## Skew Kurtosis
## Resting (L/min) 1.2400422 3.8401198
## Resting (ml/kg/min) 1.4206652 4.5251063
## Max (L/min) 0.2901591 -0.6692302
## Max (ml/kg/min) 0.5957739 0.8991398
## $multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 30.4959073335655 0.0622068480637173 YES
## 2 Mardia Kurtosis 1.21646295329905 0.223808615267926 YES
## 3 MVN <NA> <NA> YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.8723 0.0040 NO
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.8401 0.0009 NO
## 3 Shapiro-Wilk Max (L/min) 0.9704 0.6330 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9501 0.2334 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 26 0.3169231 0.09813335 0.305 0.11 0.66 0.2800 0.3500
## Resting (ml/kg/min) 26 5.1557692 1.63805537 5.070 1.74 11.05 4.5550 5.5425
## Max (L/min) 26 2.3346154 0.35424546 2.320 1.71 3.06 2.0475 2.5075
## Max (ml/kg/min) 26 37.9365385 4.85478975 38.085 28.97 51.80 35.2575 39.4300
## Skew Kurtosis
## Resting (L/min) 1.2400422 3.8401198
## Resting (ml/kg/min) 1.4206652 4.5251063
## Max (L/min) 0.2901591 -0.6692302
## Max (ml/kg/min) 0.5957739 0.8991398
## $multivariateNormality
## Test H p value MVN
## 1 Royston 22.56494 8.12886e-05 NO
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.8723 0.0040 NO
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.8401 0.0009 NO
## 3 Shapiro-Wilk Max (L/min) 0.9704 0.6330 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9501 0.2334 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 26 0.3169231 0.09813335 0.305 0.11 0.66 0.2800 0.3500
## Resting (ml/kg/min) 26 5.1557692 1.63805537 5.070 1.74 11.05 4.5550 5.5425
## Max (L/min) 26 2.3346154 0.35424546 2.320 1.71 3.06 2.0475 2.5075
## Max (ml/kg/min) 26 37.9365385 4.85478975 38.085 28.97 51.80 35.2575 39.4300
## Skew Kurtosis
## Resting (L/min) 1.2400422 3.8401198
## Resting (ml/kg/min) 1.4206652 4.5251063
## Max (L/min) 0.2901591 -0.6692302
## Max (ml/kg/min) 0.5957739 0.8991398
We check the outliers in the data using Mahalonobis distance and find that 48th reading is the strong candidate for outlier (at 99%).
## [1] 48
So we remove 48 from the female population and check normality and find that all four variables now pass univariate normality and all three multivariate normality tests.
## $multivariateNormality
## Test Statistic p value Result
## 1 Mardia Skewness 30.3724577468973 0.0640391265398032 YES
## 2 Mardia Kurtosis 0.358992219117516 0.719600910083357 YES
## 3 MVN <NA> <NA> YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.9560 0.3642 YES
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.9487 0.2542 YES
## 3 Shapiro-Wilk Max (L/min) 0.9622 0.4838 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9483 0.2491 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 24 0.2991667 0.06870964 0.300 0.11 0.44 0.2775 0.3425
## Resting (ml/kg/min) 24 4.9341667 1.15774487 5.070 1.74 7.32 4.5275 5.4275
## Max (L/min) 24 2.3150000 0.35460940 2.315 1.71 3.06 2.0175 2.5025
## Max (ml/kg/min) 24 38.1054167 4.92021825 38.085 28.97 51.80 35.6325 39.5800
## Skew Kurtosis
## Resting (L/min) -0.5865727 0.8504798
## Resting (ml/kg/min) -0.5767599 1.0202208
## Max (L/min) 0.3657211 -0.5682945
## Max (ml/kg/min) 0.5771586 0.8545083
## $multivariateNormality
## Test H p value MVN
## 1 Royston 6.361645 0.1422686 YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.9560 0.3642 YES
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.9487 0.2542 YES
## 3 Shapiro-Wilk Max (L/min) 0.9622 0.4838 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9483 0.2491 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 24 0.2991667 0.06870964 0.300 0.11 0.44 0.2775 0.3425
## Resting (ml/kg/min) 24 4.9341667 1.15774487 5.070 1.74 7.32 4.5275 5.4275
## Max (L/min) 24 2.3150000 0.35460940 2.315 1.71 3.06 2.0175 2.5025
## Max (ml/kg/min) 24 38.1054167 4.92021825 38.085 28.97 51.80 35.6325 39.5800
## Skew Kurtosis
## Resting (L/min) -0.5865727 0.8504798
## Resting (ml/kg/min) -0.5767599 1.0202208
## Max (L/min) 0.3657211 -0.5682945
## Max (ml/kg/min) 0.5771586 0.8545083
## $multivariateNormality
## Test HZ p value MVN
## 1 Henze-Zirkler 0.8187303 0.1449586 YES
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Shapiro-Wilk Resting (L/min) 0.9560 0.3642 YES
## 2 Shapiro-Wilk Resting (ml/kg/min) 0.9487 0.2542 YES
## 3 Shapiro-Wilk Max (L/min) 0.9622 0.4838 YES
## 4 Shapiro-Wilk Max (ml/kg/min) 0.9483 0.2491 YES
##
## $Descriptives
## n Mean Std.Dev Median Min Max 25th 75th
## Resting (L/min) 24 0.2991667 0.06870964 0.300 0.11 0.44 0.2775 0.3425
## Resting (ml/kg/min) 24 4.9341667 1.15774487 5.070 1.74 7.32 4.5275 5.4275
## Max (L/min) 24 2.3150000 0.35460940 2.315 1.71 3.06 2.0175 2.5025
## Max (ml/kg/min) 24 38.1054167 4.92021825 38.085 28.97 51.80 35.6325 39.5800
## Skew Kurtosis
## Resting (L/min) -0.5865727 0.8504798
## Resting (ml/kg/min) -0.5767599 1.0202208
## Max (L/min) 0.3657211 -0.5682945
## Max (ml/kg/min) 0.5771586 0.8545083
So now our data can be considered to be multivariate normal.
We do not transform the data as the data is sufficiently normal after removing the 48th reading. Box Cox transformation is not improving the normality so we skip it in the report.
### Females
## [1] 0.3972 5.3296 3.6876 49.4204
## [1] 55.25215
## lower_limit upper_limit
## 1 -4.1887339 4.983134
## 2 0.7436661 9.915534
## 3 -0.8983339 8.273534
## 4 44.8344661 54.006334
## [1] 0.3136 5.1788 2.3152 38.1548
## [1] 23.26083
## lower_limit upper_limit
## 1 -2.6619399 3.28914
## 2 2.2032601 8.15434
## 3 -0.6603399 5.29074
## 4 35.1792601 41.13034
## [1] 0.3972 5.3296 3.6876 49.4204
## [1] 55.25215
## lower_limit upper_limit
## 1 -5.00073619 5.795136
## 2 -0.06833619 10.727536
## 3 -1.71033619 9.085536
## 4 44.02246381 54.818336
## [1] 0.3136 5.1788 2.3152 38.1548
## [1] 23.26083
## lower_limit upper_limit
## 1 -3.1888 3.8160
## 2 1.6764 8.6812
## 3 -1.1872 5.8176
## 4 34.6524 41.6572
## [1] 0.3972 5.3296 3.6876 49.4204
## [1] 55.25215
## lower_limit upper_limit
## 1 -2.281192 3.075592
## 2 2.651208 8.007992
## 3 1.009208 6.365992
## 4 46.742008 52.098792
## [1] 0.3136 5.1788 2.3152 38.1548
## [1] 23.26083
## lower_limit upper_limit
## 1 -1.4242494 2.051449
## 2 3.4409506 6.916649
## 3 0.5773506 4.053049
## 4 36.4169506 39.892649
Run the profile analysis function and plot the Profile. 1 is male, 0 is female.
Display the sample means.
##
## Data Summary:
## 0 1
## v1 0.3136 0.3972
## v2 5.1788 5.3296
## v3 2.3152 3.6876
## v4 38.1548 49.4204
Display the test outputs.
## Call:
## pbg(data = dataset[1:4], group = c(rep(1, 25), rep(0, 25)), profile.plot = TRUE)
##
## Hypothesis Tests:
## $`Ho: Profiles are parallel`
## Multivariate.Test Statistic Approx.F num.df den.df p.value
## 1 Wilks 0.3728944 25.78644 3 46 6.17633e-10
## 2 Pillai 0.6271056 25.78644 3 46 6.17633e-10
## 3 Hotelling-Lawley 1.6817242 25.78644 3 46 6.17633e-10
## 4 Roy 1.6817242 25.78644 3 46 6.17633e-10
##
## $`Ho: Profiles have equal levels`
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 129.4 129.4 40.47 7.01e-08 ***
## Residuals 48 153.6 3.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $`Ho: Profiles are flat`
## F df1 df2 p-value
## 1 875.8325 3 46 1.436427e-40
So data fails all three tests. Profiles are not parallel, Profiles do not have equal levels and Profiles are not flat.
We perform PCA on the four continuous variables using ‘prcomp’ function and check results.
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.5698 1.1136 0.53994 0.06261
## Proportion of Variance 0.6161 0.3100 0.07288 0.00098
## Cumulative Proportion 0.6161 0.9261 0.99902 1.00000
We see that only two of the standard deviations are greater than one therefore two PC’s are important and they explain 92% of the variation.
We can plot the scree plot as well.
And these are the eigenvectors or loadings.
## PC1 PC2 PC3 PC4
## Resting (L/min) 0.5535272 0.3738597 -0.4912732 0.5590055
## Resting (ml/kg/min) 0.4184354 0.6475264 0.4040129 -0.4923363
## Max (L/min) 0.5131163 -0.4885243 -0.4302058 -0.5594449
## Max (ml/kg/min) 0.5052041 -0.4497583 0.6405835 0.3635095
Now we see the correlation between the first two PCs and the original variables.
## r1 r2
## 1 0.8689537 0.4163337
## 2 0.6568801 0.7210915
## 3 0.8055147 -0.5440253
## 4 0.7930937 -0.5008550
We can see that the first PC has large positive correlations with all four variables but the second one has positive correlation with the resting oxygen consumption rates and negative with the max oxygen consumption rates.
Next we look at the biplot as well. The arrows corresponding to the variables confirm the correlation findings above. And each individuals scores on first and second PC are also plotted on this plane. We also indicate the known grouping of the individuals - male and female.
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:reshape':
##
## rename, round_any
##
## Attaching package: 'scales'
## The following objects are masked from 'package:psych':
##
## alpha, rescale
We can see from the plot that 48th reading indeed looks like an outlier.
For males.
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.5607 1.0027 0.7458 0.05353
## Proportion of Variance 0.6089 0.2513 0.1390 0.00072
## Cumulative Proportion 0.6089 0.8602 0.9993 1.00000
For females.
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.4130 1.2657 0.63237 0.03967
## Proportion of Variance 0.4992 0.4005 0.09997 0.00039
## Cumulative Proportion 0.4992 0.8996 0.99961 1.00000
We try the two factor model as the PCA revealed two eigenvalues greater than 1. We use the MLE method as iterative PC method shows some warnings. We present the loadings without rotation first.
##
## Loadings:
## ML2 ML1
## Resting (L/min) 0.473 0.808
## Resting (ml/kg/min) 0.997
## Max (L/min) 0.995
## Max (ml/kg/min) 0.815 0.220
##
## ML2 ML1
## SS loadings 1.877 1.702
## Proportion Var 0.469 0.426
## Cumulative Var 0.469 0.895
We see the first factor loads positively on all variables whereas the second factor loads negatively on Resting rates and positively on Max rates.
Next we fit the orthogonal model again with Varimax rotation and see the factor loadings.
##
## Loadings:
## ML2 ML1
## Resting (L/min) 0.473 0.808
## Resting (ml/kg/min) 0.997
## Max (L/min) 0.995
## Max (ml/kg/min) 0.815 0.220
##
## ML2 ML1
## SS loadings 1.877 1.702
## Proportion Var 0.469 0.426
## Cumulative Var 0.469 0.895
We see that the first factor loads mainly on 3rd and 4th variables (and also on first) whereas the second factor in 1st and 2nd variables. So we a separation between Resting and Max oxygen consumption rates.
We also calculate the uniqueness as usual.
## Resting (L/min) Resting (ml/kg/min) Max (L/min) Max (ml/kg/min)
## 0.123581232 0.004970114 0.004907360 0.288003402
Next we calculate the factor scores.
## ML2 ML1
## 1 -0.0713375 -1.08817771
## 2 0.4428130 -0.12371747
## 3 1.2986834 -0.08484044
## 4 0.7544542 -0.94732054
## 5 0.1116231 0.17846286
## 6 1.1477198 -0.86731054
First we check the covariance matrices of the two groups - male and female. They look different.
## Resting (L/min) Resting (ml/kg/min) Max (L/min)
## Resting (L/min) 0.00712100 0.0700030 0.03144717
## Resting (ml/kg/min) 0.07000300 1.1441790 0.14767817
## Max (L/min) 0.03144717 0.1476782 0.45587733
## Max (ml/kg/min) 0.15058033 3.4309085 3.30812183
## Max (ml/kg/min)
## Resting (L/min) 0.1505803
## Resting (ml/kg/min) 3.4309085
## Max (L/min) 3.3081218
## Max (ml/kg/min) 55.2521457
## Resting (L/min) Resting (ml/kg/min) Max (L/min)
## Resting (L/min) 0.009630154 0.14593446 0.005678769
## Resting (ml/kg/min) 0.145934462 2.68322538 -0.049495692
## Max (L/min) 0.005678769 -0.04949569 0.125489846
## Max (ml/kg/min) 0.009708923 1.36016477 0.944044615
## Max (ml/kg/min)
## Resting (L/min) 0.009708923
## Resting (ml/kg/min) 1.360164769
## Max (L/min) 0.944044615
## Max (ml/kg/min) 23.568983538
They look different. We again check after removing the outlier ie 48th reading.
## Resting (L/min) Resting (ml/kg/min) Max (L/min)
## Resting (L/min) 0.00712100 0.0700030 0.03144717
## Resting (ml/kg/min) 0.07000300 1.1441790 0.14767817
## Max (L/min) 0.03144717 0.1476782 0.45587733
## Max (ml/kg/min) 0.15058033 3.4309085 3.30812183
## Max (ml/kg/min)
## Resting (L/min) 0.1505803
## Resting (ml/kg/min) 3.4309085
## Max (L/min) 3.3081218
## Max (ml/kg/min) 55.2521457
## Resting (L/min) Resting (ml/kg/min) Max (L/min)
## Resting (L/min) 0.004721014 0.06867754 0.004273913
## Resting (ml/kg/min) 0.068677536 1.34037319 -0.042439130
## Max (L/min) 0.004273913 -0.04243913 0.125747826
## Max (ml/kg/min) 0.012456884 1.02122862 1.145636957
## Max (ml/kg/min)
## Resting (L/min) 0.01245688
## Resting (ml/kg/min) 1.02122862
## Max (L/min) 1.14563696
## Max (ml/kg/min) 24.20854764
To be sure we use the Box test for equality of covariance matrices. This test is sensitive to multivariate normality so we use the data after deleting the outlier (48th).
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: dataset[-48, 1:4]
## Chi-Sq (approx.) = 47.6, df = 10, p-value = 7.343e-07
It fails the test suggesting that LDA may not be suitable and therefore we must do QDA. However we do both.
The prior probabilities are 0.5 and 0.5 as the two groups male - male and female - are evenly split in the data. Also the two costs of misclassification can be assumed equal because there is not reason to suppose otherwise.
We perform LDA on the entire data set and use it to predict the entire data set. We check the confusion matrix and calculate the APER.
##
## dataset.predict female male
## female 23 1
## male 2 24
## [1] 0.94
We can see the confusion matrix. The APER is 6% as the success rate is 94%.
We can plot it also.
Next we again perform LDA but with Lachenbruch hold-out procedure and again print confusion matrix and error rate. We see performance goes down to 90% success rate.
##
## female male
## female 23 3
## male 2 22
## [1] 0.9
We repeat the above using QDA. But we delete the 48th data point for the sake of normality.
## [1] male male male male male male male male female male
## [11] male male male male male male male male male male
## [21] male male male male male female female female female female
## [31] female female female female female female female female male female
## [41] female female female female female female female female female
## Levels: female male
##
## dataset.predict female male
## female 23 1
## male 1 24
## [1] 0.9591837
We can see the confusion matrix and the success rate at 96 percent - better than LDA. We try now with Lachenbruch hold-out procedure. Success rate is now 92%.
##
## female male
## female 22 2
## male 2 23
## [1] 0.9183673