A zoologist collected lizards in the southwestern United States. Among other variables, he measured mass (in grams) and the snout-vent length (in millimeters). Because the tails sometimes break off in the wild, the snout-vent length is a more representative measure of length. The data for the lizards from two genera, Cnemidophorus (C) and Sceloporus (S) was collected in 1997 and 1999.
This activity is aimed to replicate the findings for example 6.71 to show that there is difference in the mass and snout-vent means for the genera of the two lizards using the Hotelling T2 multivariate test for independent samples. I included the testing of the analysis assumptions as well as a bit of exploratory analysis. When the univariate cases are explored for differences the result is that there is none thus example was used to highlight the need for inspection using multivariate tests.
| Summary Statistics | ||
|---|---|---|
| across Types | ||
| Characteristic | C, N = 20 | S, N = 40 |
| Mass | ||
| Mean (SD) | 2.24 (0.59) | 2.37 (0.71) |
| Median (25%,75%) | 2.50 (1.74,2.78) | 2.29 (1.84,2.72) |
| Range | 0.88, 2.90 | 1.11, 3.76 |
| Sout-Length | ||
| Mean (SD) | 4.39 (0.16) | 4.31 (0.21) |
| Median (25%,75%) | 4.44 (4.27,4.52) | 4.26 (4.15,4.42) |
| Range | 4.03, 4.60 | 3.95, 4.74 |
Normal Distribution plots of SVL for Type C
It should be noted that the lines in the above plots indicate the normal distribution features that would be assumed if the data were treated as though it had come from such a distribution. The points and histogram are what we use to check if the assumed distribution is a good fit. The mass for the type C lizard is not normally distributed. The histogram is highly skewed with what seems to have two peaks. The points around each of the distribution curves do not hug the line in any satisfactory or consistent manner.
Normal Distribution plots of SVL for Type C
The normality of the SVL variable for the type C lizards are also not satisfactory as these display similar features to the mass variable.
The above show the marginal density plots without any form of distribution imposed onto the data. We can see that in each variable which seems to correlate highly with each other, has two modes. This will be problematic as we are using techniques that accomodate only one mode. It may be that this problem is resolved once the bivariate density is taken into account but unfortunately in this case it is not so.
It is evident that we are not be working with a unimodal distribution. This will be an issue since only one mean vector is used as the measure of central tendency within the groups. This would explain the distortion that is found within the normality plots for the type C lizard. This indicates that there may be another variable that needs to be taken into consideration for the analysis or another method that accomodates multiple modes must be used.
typecnorm <- data %>% filter(type == "C") %>% select(!type) %>% mvnTest(B = 2000)
typecnorm$mv.test[2]
## p-value
## 0.1145
The above is a test for multivariate normal distributions. So I interpret these tests for normality with more caution now. We have the result which is considerately higher than the usual cut off value of 0.05 so one might have been somewhat comfortable with the normality assumption being met. However we can see that this does not mean that it’s the type of normal distribution that is suitable for the technique as distributions which have more than one peak can certainly still count as being normally distributed.
Normal Distribution plots of Mass for Type S
For a sample size this large the central limit theorem would be applied for the normality assumption. However I still find it useful to make the plots when checking for subgroup indicators. In this case the normality assumption is well satisfied.
Normal Distribution plots of Snout Length for Type S
For the most part the normality assumption is satisfied but there does seem to be an indicator of a small subgroup; the histogram seems to have a small peak occuring in the right tail. This is better seen with the density plot.
| x | |
|---|---|
| Tn | 0.0521 |
| p-value | 0.144 |
| Result | YES |
The univariate plots for the mass variable does not inspire confidence that the data comes from a bivariate normally distributed variable. However,given that the sample size is twenty it may be the case that normality may already be satisfied for the sampling distribution of the sample mean.It should be noted distribution for the C group seems to be bimodal and may possible require another type of analysis.
Although the data for snout-vent length approaches normality better there still remains the issue of the bimodal nature of the C group distribution.
The bivariate plot shows a strong linear correlation between mass and snout-vent length within the two types of lizards that appear to be different from each other. You may not be able to infer this judging from the mean and standard deviation from the descriptive summary tables (as in the case in the book). Most of the points are within the confidence region ellipses except for a few points that show minor deviation from the overall trend.| mass | svl | |
|---|---|---|
| mass | 0.35 | 0.09 |
| svl | 0.09 | 0.03 |
| mass | svl | |
|---|---|---|
| mass | 0.51 | 0.15 |
| svl | 0.15 | 0.04 |
| TestName | Statistic | DegreesFreedom | Pval |
|---|---|---|---|
| BoxM | 3.02 | 3 | 0.39 |
The test for equality of the Covariance matrices show that the variances are the same across the two groups of lizards.
The multivariate test for appropriate skewness and kurtosis shows that the values are within tolerable ranges.
Finally I consider outliers; although it is the case that three potential outliers are identified via the Malahanobis distance values, these values are not considered outliers to the trend that is found between the variables. There no values were removed for this analysis.
First we calculate the mean vector containing Mass and Snout Length for each type.
They used large sample inference.
#poolvar <- (nS -1)/(nS+nC -2) * typeScov + (nC -1)/(nS+nC-2) * typeCcov
#poolvar
Now we create the simultainous confidence intervals
| Variable | Lower | Upper |
|---|---|---|
| Mass | -0.55 | 0.30 |
| Svl | -0.03 | 0.21 |
#det of covariance matrix > 0 # did not include the correlation #need code for single mean comparison #need code for paired means comparison # sample size calculation
covmat <- cov(datafile[,3:5]) det(covmat)