Men & Women’s Track Records by Country
- Test for a mean difference in the track records between women and men, you may assume they share the same covariance structure, calculate the p-value.
- Determine the first two principal components for the standardized variables of women’s track records. Prepare a table showing the correlations of the standardized variables with the principal components and the cumulative percentage of the total variance explained by the two components.
- Interpret the two principal components obtained in part (b). (note that the first component is essentially a normalized unit vector and might measure the athletic excellence of a given nation and the second component might measure the relative strength of a nation at the various running distances.)
- Rank the nations based on their score on the first principal component. Does this ranking correspond with your intuitive notion of athletic excellence for the various countries?
- Repeat parts (b), (c) and (d) with the men’s track data.
- Using the first two principal components of the women’s data and the first two of the men’s data, test for a difference in the two population principal components. Calculate the p-value and compare it to your answer in part (a). Note that the returned principal components of prcomp when you use standardardized values are also standardized. You’ll want to use the loadings to transform the original data. Also, pay attention to the signs on the loadings. The first two principal components of the two data sets can be interpreted as the same thing, but you may get opposite signs, this can throw off Hotelling’s T-squared statistic.
- Perform a factor analysis of the national track records for women. Use the sample covariance matrix S and interpret the factors. Repeat the analysis with the sample correlation matrix R. Does it make a difference if R, rather than S, is factored? Explain.
- Perform a factor analysis of the national track records for men. Is the appropriate factor model for the men’s data different from the one for the women’s data? If not, are the interpretations of the factors roughly the same? If the models are different, explain the differences.
- (STAT488 only) Convert the national track records for women to speeds measured in meters per second. Notice the records for 800m, 1500m, 3000m and marathon are given in minutes. A marathon is 26.2 miles or 42195 meters long. Perform a principal components analysis using the covariance matrix S of this speed data. Compare the results with parts (b){(d) above. Do your interpretations of the components differ? Rank the nations based on their score in the first principal component, how does it compare? Which analysis do you prefer?
- (STAT488 only) Repeat the process with the men’s track data, that is, convert it to speed and perform the analysis in parts (b){(d) above.
- (STAT488 only) Using the speed data, use the first two principal components to test for a difference in the two populations. Calculate the p-value and compare it to your answer in part (a) and (f) above. As with part (f), you may need to adjust the loadings so they match in the two principal components.
a. Testing for mean difference in track records between men & women (\(\alpha = 0.05\)).
\(H_{0}: \mu_{men} = \mu_{women}\)
\(H_{A}: \mu_{men} \neq \mu_{women}\)
Test stat: 82.838
Numerator df: 6
Denominator df: 101
P-value: 0
Conclusion: With a p-value less than 0.05, we reject the null indicating statistically significant evidence of the difference of mean national track records between men & women.
b. Loadings of first two principal components for the standardized variables of women’s track records with a table showing the correlations of the standardized variables with the principal components and the cumulative percentage of the total variance explained by the two components.
Standard deviations (1, .., p=6):
[1] 2.2399302 0.7239524 0.4643059 0.3514499 0.2716888 0.2137607
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5
100m -0.4137534 0.3846299 -0.12054812 0.5735489 -0.3939281
200m -0.4202377 0.3824032 -0.06535754 0.1897090 0.3176902
400m -0.4061645 0.3929376 0.44254638 -0.5768634 0.1985368
800m -0.4226394 -0.2858530 -0.09940074 -0.3891141 -0.6981406
1500m -0.4067668 -0.3159300 -0.67477321 -0.1371183 0.4276702
marathon -0.3783590 -0.6081973 0.56581787 0.3634138 0.1848636
PC6
100m -0.42684706
200m 0.73210645
400m -0.33555163
800m 0.30161790
1500m -0.27875957
marathon -0.02337895
| PC1 |
-0.4137534 |
-0.4202377 |
-0.4061645 |
-0.4226394 |
-0.4067668 |
-0.3783590 |
| PC2 |
0.3846299 |
0.3824032 |
0.3929376 |
-0.2858530 |
-0.3159300 |
-0.6081973 |
| Standard deviation |
2.23993 |
0.7239524 |
| Proportion of Variance |
0.83621 |
0.0873500 |
| Cumulative Proportion |
0.83621 |
0.9235700 |
c. Interpretation of principal components from part (b)
The loadings of the first principal component being incredibly similar may indicate a latent variable such as a nation’s athletic excellence (as indicated in the exercise) is responsible for much of the variance. The second principal component’s loadings are positive for shorter distances and negative for longer distances, so it may be a proxy for a nation’s abilities in different style running events.
d. Ranking Countries based on PC1
As far as I know the rankings align fairly well with track talent; the countries who rank highest in terms of this principal component are at least competitive in the Olympics.
e. Parts (b)-(d) with men’s track records
Standard deviations (1, .., p=6):
[1] 2.2264396 0.7001597 0.4577477 0.4326613 0.3111293 0.2433370
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5
100m -0.4013958 -0.5138807 0.4789843 -0.01687632 -0.2969661
200m -0.4180798 -0.3917657 0.1620888 -0.17179519 0.5364074
400m -0.4035829 -0.2881765 -0.7458936 0.42347137 -0.1323242
800m -0.4104405 0.3266872 -0.2985046 -0.65176976 0.2267076
1500m -0.4208110 0.3494365 0.1138092 -0.13633585 -0.6628797
marathon -0.3945481 0.5201637 0.2930638 0.58947631 0.3402392
PC6
100m 0.50686134
200m -0.57289582
400m 0.02966769
800m 0.39938562
1500m -0.47944005
marathon 0.15693998
| PC1 |
-0.4013958 |
-0.4180798 |
-0.4035829 |
-0.4104405 |
-0.4208110 |
-0.3945481 |
| PC2 |
-0.5138807 |
-0.3917657 |
-0.2881765 |
0.3266872 |
0.3494365 |
0.5201637 |
| Standard deviation |
2.22644 |
0.7001597 |
| Proportion of Variance |
0.82617 |
0.0817000 |
| Cumulative Proportion |
0.82617 |
0.9078800 |
The loadings of the first principal component are incredibly similar indicating that there is possibly a latent variable such as a nation’s athletic excellence (as indicated in the exercise) is responsible for much of the variance. The second principal component’s loadings are negative for shorter distances and positive for longer distances, so it may be a proxy for a nation’s abilities in different style running events. Additionally, the national rankings based on the first principal component seem to align with national rankings as I understand them.
f. Testing for differences in Population Principal components using PC1 & PC2 for Men’s/Women’s National Track Records
g. Factor analysis of National Track Records for Women.
Call:
factanal(x = df.w, factors = 2, n.obs = 54, rotation = "none", method = "mle")
Uniquenesses:
100m 200m 400m 800m 1500m marathon
0.106 0.005 0.160 0.006 0.167 0.264
Loadings:
Factor1 Factor2
100m 0.926 -0.192
200m 0.962 -0.263
400m 0.905 -0.145
800m 0.942 0.327
1500m 0.889 0.209
marathon 0.795 0.323
Factor1 Factor2
SS loadings 4.910 0.382
Proportion Var 0.818 0.064
Cumulative Var 0.818 0.882
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 6.41 on 4 degrees of freedom.
The p-value is 0.171
Call:
factanal(factors = 2, covmat = w.rm, n.obs = 54, rotation = "varimax", method = "mle")
Uniquenesses:
100m 200m 400m 800m 1500m marathon
0.106 0.005 0.160 0.006 0.167 0.264
Loadings:
Factor1 Factor2
100m 0.824 0.464
200m 0.898 0.433
400m 0.778 0.485
800m 0.495 0.865
1500m 0.533 0.741
marathon 0.387 0.766
Factor1 Factor2
SS loadings 2.770 2.522
Proportion Var 0.462 0.420
Cumulative Var 0.462 0.882
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 6.41 on 4 degrees of freedom.
The p-value is 0.171
The choice of rotation and factoring seems to be important when conducting factor analysis becuase they impact how different variables are loaded into the factors; both methods concluded with the same cumulative proportion of variance explained (88.2%) & both methods had the same variable uniqueness suggesting that the variables were equally important in each factor analysis. Both analyses got the same results (cumulative proportion with two factors) by different means.
h. Factor analysis of National Track Records for Men.
Call:
factanal(x = df.m, factors = 2, n.obs = 54, rotation = "none", method = "mle")
Uniquenesses:
100m 200m 400m 800m 1500m marathon
0.154 0.009 0.250 0.164 0.027 0.208
Loadings:
Factor1 Factor2
100m 0.914 -0.102
200m 0.983 -0.158
400m 0.865
800m 0.859 0.312
1500m 0.881 0.445
marathon 0.797 0.396
Factor1 Factor2
SS loadings 4.700 0.488
Proportion Var 0.783 0.081
Cumulative Var 0.783 0.865
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 5.5 on 4 degrees of freedom.
The p-value is 0.239
Call:
factanal(factors = 2, covmat = m.rm, n.obs = 54, rotation = "varimax", method = "mle")
Uniquenesses:
100m 200m 400m 800m 1500m marathon
0.154 0.009 0.250 0.164 0.027 0.208
Loadings:
Factor1 Factor2
100m 0.806 0.443
200m 0.895 0.436
400m 0.691 0.522
800m 0.523 0.750
1500m 0.464 0.870
marathon 0.424 0.782
Factor1 Factor2
SS loadings 2.598 2.589
Proportion Var 0.433 0.432
Cumulative Var 0.433 0.865
Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 5.5 on 4 degrees of freedom.
The p-value is 0.239
The factor loadings are relatively similar for men and for women using the varimax rotation. Additionally these two factors account for similar proportions of variance in national track records (88.2% and 86.5%).
i. Converting Women’s National Track Records to Average Speed (m/s)
Units: avg. m/s
| ARG |
8.643042 |
8.718396 |
7.619048 |
6.504065 |
5.882353 |
4.678353 |
| AUS |
8.992806 |
8.996851 |
8.225375 |
6.734007 |
6.218905 |
4.900355 |
| AUT |
8.968610 |
8.810573 |
7.902015 |
6.872852 |
6.172840 |
4.556203 |
| BEL |
8.976661 |
8.896797 |
7.774538 |
6.768190 |
6.127451 |
4.916113 |
| BER |
8.726003 |
8.676790 |
7.504690 |
6.441224 |
5.827506 |
4.037490 |
| BRA |
8.952552 |
8.849557 |
7.902015 |
6.768190 |
5.995204 |
4.770708 |
Standard deviations (1, .., p=6):
[1] 2.2394797 0.7372135 0.4298389 0.3581696 0.2840273 0.2180105
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5 PC6
s100 0.4110369 -0.3985261 0.19870602 -0.5452687 0.3871234 0.43076577
s200 0.4194769 -0.3817117 0.11579002 -0.1503919 -0.2699084 -0.75462556
s400 0.4059147 -0.3822980 -0.55004814 0.4729653 -0.2383517 0.32560833
s800 0.4213789 0.2724519 0.07729831 0.4566153 0.6932049 -0.23066019
s1500 0.4105739 0.3252716 0.61695997 0.1892135 -0.4742597 0.29028244
smar 0.3797236 0.6076921 -0.50787892 -0.4605094 -0.1225483 -0.03863049
| PC1 |
0.4110369 |
0.4194769 |
0.4059147 |
0.4213789 |
0.4105739 |
0.3797236 |
| PC2 |
-0.3985261 |
-0.3817117 |
-0.3822980 |
0.2724519 |
0.3252716 |
0.6076921 |
| Standard deviation |
2.23948 |
0.7372135 |
| Proportion of Variance |
0.83588 |
0.0905800 |
| Cumulative Proportion |
0.83588 |
0.9264600 |
j. Repeating (i) with Men’s Track Records
Units: avg. m/s
| Argentina |
9.775171 |
9.818360 |
8.661758 |
7.532957 |
6.793478 |
5.427568 |
| Australia |
10.070493 |
9.970090 |
9.013069 |
7.662835 |
7.082153 |
5.515254 |
| Austria |
9.852217 |
9.779951 |
8.733624 |
7.532957 |
6.983240 |
5.318787 |
| Belgium |
9.861933 |
9.905894 |
8.884940 |
7.707129 |
7.002801 |
5.528695 |
| Bermuda |
9.737098 |
9.852217 |
8.837826 |
7.448790 |
6.756757 |
4.804605 |
| Brazil |
10.000000 |
10.055304 |
9.031384 |
7.843137 |
7.002801 |
5.579135 |
Standard deviations (1, .., p=6):
[1] 2.2135294 0.7218360 0.4710989 0.4387464 0.3252378 0.2429571
Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5 PC6
s100 0.4013532 0.5044835 -0.48337352 0.03846601 0.3388272 -0.48422984
s200 0.4180942 0.3929087 -0.16824214 0.17014061 -0.5665582 0.54090531
s400 0.4026901 0.3010357 0.73492220 -0.43653013 0.1240380 -0.03411095
s800 0.4111819 -0.3304516 0.30737166 0.63871947 -0.2199149 -0.41343524
s1500 0.4217625 -0.3523522 -0.08502157 0.14181362 0.6319581 0.52082007
smar 0.3936996 -0.5168620 -0.31020634 -0.59240216 -0.3179448 -0.17203808
| PC1 |
0.4013532 |
0.4180942 |
0.4026901 |
0.4111819 |
0.4217625 |
0.3936996 |
| PC2 |
0.5044835 |
0.3929087 |
0.3010357 |
-0.3304516 |
-0.3523522 |
-0.5168620 |
| Standard deviation |
2.213529 |
0.721836 |
| Proportion of Variance |
0.816620 |
0.086840 |
| Cumulative Proportion |
0.816620 |
0.903460 |
For both parts (i) and (j), it seems that the principal components are loaded similarly (despite the direction of signs). The first principal component is constructed with nearly equal parts from each variable and may account for a latent country talent variable. The second principal component is divided in sign by short vs long distance variables. I prefer this speed analysis because all of the variables have the same unit of measurement; it does not seem to provide much other benefit in terms of the analysis.
k. Testing for differences in Population Principal components using the first two speed-based principal components for Men’s/Women’s National Track Records