Adrita Paria
Few things are revealed from the density plots. These are
Both for men and women none of the speeds is normally distributed.
Speeds are usually higher for men.
Density plots for men show the mixture of different populations in the dataset because of the multimodal shapes.
The multimodal nature in not so prominent in case of women dataset.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 0.7518 0.2884 0.1111 0.09639 0.08226 0.07689 0.06495
## Proportion of Variance 0.8197 0.1206 0.0179 0.01347 0.00981 0.00857 0.00612
## Cumulative Proportion 0.8197 0.9404 0.9583 0.97172 0.98153 0.99011 0.99623
## PC8
## Standard deviation 0.05101
## Proportion of Variance 0.00377
## Cumulative Proportion 1.00000
94.0% of the total variability in the dataset is explained by the first two sample PCs. This is also evident from the position of the elbow shape in the scree plot. So we are going to consider first two sample PCS.
The coefficient of the variables in the first two sample PCs are
## PC1 PC2
## 100m -0.3153454 -0.59955831
## 200m -0.3251131 -0.47163652
## 400m -0.3090385 -0.23213403
## 800m -0.3123950 -0.05861613
## 1500m -0.3417021 0.07902820
## 5000m -0.4063680 0.29596157
## 10000m -0.4186956 0.29760231
## Marathon -0.3802133 0.42232798
Several facts are revealed by PCA. These are:
The uniformity of weights in the first principal component is really a reflection of the considerable amount of structure in the original data. The coefficients of the variables in PC1 are nearly equal. This means that PC1 considers all the variables to almost same extent which is also clear from the previous table. Using the values of PC1 we can actually sort of rank the individuals. So the analysis of the first PC results in ranking the individuals that are least plausible from the viewpoint of subjective judgements, and, in any case, this has a considerable intrinsic interest. This technique will be very useful when such judgements are not so easy to come by.
The second PC can be interpreted as a measure of relative strength of a given nation at various distances. Hence the a value near to 0 of the second PC would indicate that the particular nation had achieved at about the same level in both long and short distances. Extreme values in either end indicate imbalance in achievement. Thus the second PC serves as a measure of differential achievement.
Let us try to look at the summary of PC2 for all the individuals.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.765363 -0.162563 0.004861 0.000000 0.186200 0.514977
Since PC2 serves as a measure of differential achievement, extreme negative values will indicate better performances in shorter distances and extreme positive values will indicate better performances in longer distances.
So, looking at the values of PC2 we can actually classify the individuals into three groups - one with better performances in shorter distances, one with equivalent performances in both shorter and longer distances and one with better performances in longer distances.
The speed intervals of the three groups are
## [1] "(-0.767,-0.339]" "(-0.339,0.0882]" "(0.0882,0.516]"
## [1] Group - 1
## [1] bermuda domrep malaysia singapore thailand usa wsamoa
## [1] Group - 2
## [1] argentina australia brazil burma canada chile
## [7] czech france gdr frg gbni greece
## [13] hungary indonesia italy korea luxembourg png
## [19] philippines poland sweden switzerland taipei ussr
## [1] Group - 3
## [1] austria belgium china colombia cookis costa
## [7] denmark finland guatemala india ireland israel
## [13] japan kenya dprkorea mauritius mexico netherlands
## [19] nz norway portugal rumania spain turkey
Now by plotting the PC1 vs PC2 let us see how good this grouping is:
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 0.9894 0.3136 0.23001 0.15263 0.08494 0.07216 0.05833
## Proportion of Variance 0.8372 0.0841 0.04525 0.01993 0.00617 0.00445 0.00291
## Cumulative Proportion 0.8372 0.9213 0.96654 0.98647 0.99264 0.99709 1.00000
92.13% of the total variability in the dataset is explained by the first two sample PCs. This is also evident from the position of the elbow shape in the scree plot. So we are going to consider first two sample PCS.
The coefficient of the variables in the first two sample PCs are
## PC1 PC2
## 100m 0.2909154 -0.426435627
## 200m 0.3414473 -0.558102524
## 400m 0.3390391 -0.383489492
## 800m 0.3053161 -0.006696912
## 1500m 0.3857370 0.198885662
## 3000m 0.3999272 0.254144679
## Marathon 0.5309254 0.505391101
Let us try to look at the summary of PC2 for all the individuals.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.6370 -0.2188 0.0231 0.0000 0.2759 0.6135
## [1] "(-0.638,-0.22]" "(-0.22,0.197]" "(0.197,0.615]"
## [1] Group - 1
## [1] argentina bermuda brazil canada czech domrep
## [7] finland gdr guatemala mauritius philippines poland
## [13] taipei wsamoa
## [1] Group - 2
## [1] australia austria belgium burma colombia france
## [7] frg gbni greece hungary india indonesia
## [13] israel italy kenya malaysia netherlands png
## [19] rumania sweden thailand turkey usa ussr
## [1] Group - 3
## [1] chile china cookis costa denmark ireland
## [7] japan korea dprkorea luxembourg mexico nz
## [13] norway portugal singapore spain switzerland
Now by plotting the PC1 vs PC2 let us see how good this grouping is:
So we have seen that both for men and women, performance wise the participating countries have three groups - one with dominant performance in shorter tracks, one with equivalent performance in shorter and longer tracks, and the rest one with dominant performance in longer tracks.
Now let us see for to what extent the performance of men and women are in agreement.
## women_groups
## men_groups 1 2 3
## 1 3 3 1
## 2 8 12 4
## 3 3 9 12
##
## Fisher's Exact Test for Count Data
##
## data: tb
## p-value = 0.0717
## alternative hypothesis: two.sided