1984 Olympic Track Data Analysis

Adrita Paria

Dataset for men

Dataset for women

Density plots of the variables

Few things are revealed from the density plots. These are

Sample correlation matrix plot for men

Sample correlation matrix plot for women

Principal component analysis for men

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     0.7518 0.2884 0.1111 0.09639 0.08226 0.07689 0.06495
## Proportion of Variance 0.8197 0.1206 0.0179 0.01347 0.00981 0.00857 0.00612
## Cumulative Proportion  0.8197 0.9404 0.9583 0.97172 0.98153 0.99011 0.99623
##                            PC8
## Standard deviation     0.05101
## Proportion of Variance 0.00377
## Cumulative Proportion  1.00000

94.0% of the total variability in the dataset is explained by the first two sample PCs. This is also evident from the position of the elbow shape in the scree plot. So we are going to consider first two sample PCS.

Principal component analysis for men (Contd.)

The coefficient of the variables in the first two sample PCs are

##                 PC1         PC2
## 100m     -0.3153454 -0.59955831
## 200m     -0.3251131 -0.47163652
## 400m     -0.3090385 -0.23213403
## 800m     -0.3123950 -0.05861613
## 1500m    -0.3417021  0.07902820
## 5000m    -0.4063680  0.29596157
## 10000m   -0.4186956  0.29760231
## Marathon -0.3802133  0.42232798

Several facts are revealed by PCA. These are:

Principal component analysis for men (Contd.)

Let us try to look at the summary of PC2 for all the individuals.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.765363 -0.162563  0.004861  0.000000  0.186200  0.514977
## [1] "(-0.767,-0.339]" "(-0.339,0.0882]" "(0.0882,0.516]"
## [1] Group - 1
## [1] bermuda   domrep    malaysia  singapore thailand  usa       wsamoa
## [1] Group - 2
##  [1] argentina   australia   brazil      burma       canada      chile      
##  [7] czech       france      gdr         frg         gbni        greece     
## [13] hungary     indonesia   italy       korea       luxembourg  png        
## [19] philippines poland      sweden      switzerland taipei      ussr
## [1] Group - 3
##  [1] austria     belgium     china       colombia    cookis      costa      
##  [7] denmark     finland     guatemala   india       ireland     israel     
## [13] japan       kenya       dprkorea    mauritius   mexico      netherlands
## [19] nz          norway      portugal    rumania     spain       turkey

Principal component analysis for men (Contd.)

Now by plotting the PC1 vs PC2 let us see how good this grouping is:

Principal component analysis for women

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     0.9894 0.3136 0.23001 0.15263 0.08494 0.07216 0.05833
## Proportion of Variance 0.8372 0.0841 0.04525 0.01993 0.00617 0.00445 0.00291
## Cumulative Proportion  0.8372 0.9213 0.96654 0.98647 0.99264 0.99709 1.00000

92.13% of the total variability in the dataset is explained by the first two sample PCs. This is also evident from the position of the elbow shape in the scree plot. So we are going to consider first two sample PCS.

Principal component analysis for women (Contd.)

The coefficient of the variables in the first two sample PCs are

##                PC1          PC2
## 100m     0.2909154 -0.426435627
## 200m     0.3414473 -0.558102524
## 400m     0.3390391 -0.383489492
## 800m     0.3053161 -0.006696912
## 1500m    0.3857370  0.198885662
## 3000m    0.3999272  0.254144679
## Marathon 0.5309254  0.505391101

Principal component analysis for women (Contd.)

Let us try to look at the summary of PC2 for all the individuals.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.6370 -0.2188  0.0231  0.0000  0.2759  0.6135
## [1] "(-0.638,-0.22]" "(-0.22,0.197]"  "(0.197,0.615]"
## [1] Group - 1
##  [1] argentina   bermuda     brazil      canada      czech       domrep     
##  [7] finland     gdr         guatemala   mauritius   philippines poland     
## [13] taipei      wsamoa
## [1] Group - 2
##  [1] australia   austria     belgium     burma       colombia    france     
##  [7] frg         gbni        greece      hungary     india       indonesia  
## [13] israel      italy       kenya       malaysia    netherlands png        
## [19] rumania     sweden      thailand    turkey      usa         ussr
## [1] Group - 3
##  [1] chile       china       cookis      costa       denmark     ireland    
##  [7] japan       korea       dprkorea    luxembourg  mexico      nz         
## [13] norway      portugal    singapore   spain       switzerland

Principal component analysis for women (Contd.)

Now by plotting the PC1 vs PC2 let us see how good this grouping is:

Comparing the PCA results for men and women

##           women_groups
## men_groups  1  2  3
##          1  3  3  1
##          2  8 12  4
##          3  3  9 12
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tb
## p-value = 0.0717
## alternative hypothesis: two.sided