Homework I

Perform a factor analysis of the national track records for women given in Table 1.9. Check for outliers. Use the sample correlation matrix. Interpret the factors and compute factor scores.

Then convert the national track records to speeds measured in meters per second. Repeat the factor analysis with the speed data. Compare your results with the results obtained before. Which analysis do you prefer? Why?

Section 1: Preprocess the data


# Read in the data, and give names to the variables

t = read.table("d:\\T1-9.dat", sep = "\t", header = F)

colnames(t) = c("country", "sp100", "sp200", "sp400", "sp800", "sp1500", "sp3000", 
    "marathon")

t

##    country sp100 sp200 sp400 sp800 sp1500 sp3000 marathon
## 1      ARG 11.57 22.94 52.50  2.05   4.25   9.19    150.3
## 2      AUS 11.12 22.23 48.63  1.98   4.02   8.63    143.5
## 3      AUT 11.15 22.70 50.62  1.94   4.05   8.78    154.3
## 4      BEL 11.14 22.48 51.45  1.97   4.08   8.82    143.1
## 5      BER 11.46 23.05 53.30  2.07   4.29   9.81    174.2
## 6      BRA 11.17 22.60 50.62  1.97   4.17   9.04    147.4
## 7      CAN 10.98 22.62 49.91  1.97   4.00   8.54    148.4
## 8      CHI 11.65 23.84 53.68  2.00   4.22   9.26    152.2
## 9      CHN 10.79 22.01 49.81  1.93   3.84   8.10    139.4
## 10     COL 11.31 22.92 49.64  2.04   4.34   9.37    155.2
## 11     COK 12.52 25.91 61.65  2.28   4.82  11.10    212.3
## 12     CRC 11.72 23.92 52.57  2.10   4.52   9.84    164.3
## 13     CZE 11.09 21.97 47.99  1.89   4.03   8.87    145.2
## 14     DEN 11.42 23.36 52.92  2.02   4.12   8.71    149.3
## 15     DOM 11.63 23.91 53.02  2.09   4.54   9.89    166.5
## 16     FIN 11.13 22.39 50.14  2.01   4.10   8.69    148.0
## 17     FRA 10.73 21.99 48.25  1.94   4.03   8.64    148.3
## 18     GER 10.81 21.71 47.60  1.92   3.96   8.51    141.4
## 19     GBR 11.10 22.10 49.43  1.94   3.97   8.37    135.2
## 20     GRE 10.83 22.67 50.56  2.00   4.09   8.96    153.4
## 21     GUA 11.92 24.50 55.64  2.15   4.48   9.71    171.3
## 22     HUN 11.41 23.06 51.50  1.99   4.02   8.55    148.5
## 23     INA 11.56 23.86 55.08  2.10   4.36   9.50    154.3
## 24     IND 11.38 22.82 51.05  2.00   4.10   9.11    158.1
## 25     IRL 11.43 23.02 51.07  2.01   3.98   8.36    142.2
## 26     ISR 11.45 23.15 52.06  2.07   4.24   9.33    156.4
## 27     ITA 11.14 22.60 51.31  1.96   3.98   8.59    143.5
## 28     JPN 11.36 23.33 51.93  2.01   4.16   8.74    139.4
## 29     KEN 11.62 23.37 51.56  1.97   3.96   8.39    138.5
## 30  KOR, S 11.49 23.80 53.67  2.09   4.24   9.01    146.1
## 31  KOR, N 11.80 25.10 56.23  1.97   4.25   8.96    145.3
## 32     LUX 11.76 23.96 56.07  2.07   4.35   9.21    149.2
## 33     MAS 11.50 23.37 52.56  2.12   4.39   9.31    169.3
## 34     MRI 11.72 23.83 54.62  2.06   4.33   9.24    167.1
## 35     MEX 11.09 23.13 48.89  2.02   4.19   8.89    144.1
## 36     MYA 11.66 23.69 52.96  2.03   4.20   9.08    158.4
## 37     NED 11.08 22.81 51.35  1.93   4.06   8.57    143.4
## 38     NZL 11.32 23.13 51.60  1.97   4.10   8.76    146.5
## 39     NOR 11.41 23.31 52.45  2.03   4.01   8.53    141.1
## 40     PNG 11.96 24.68 55.18  2.24   4.62  10.21    221.1
## 41     PHI 11.28 23.35 54.75  2.12   4.41   9.81    165.5
## 42     POL 10.93 22.13 49.28  1.95   3.99   8.53    144.2
## 43     POR 11.30 22.88 51.92  1.98   3.96   8.50    143.3
## 44     ROM 11.30 22.35 49.88  1.92   3.90   8.36    142.5
## 45     RUS 10.77 21.87 49.11  1.91   3.87   8.38    141.3
## 46     SAM 12.38 25.45 56.32  2.29   5.42  13.12    191.6
## 47     SIN 12.13 24.54 55.08  2.12   4.52   9.94    154.4
## 48     ESP 11.06 22.38 49.67  1.96   4.01   8.48    146.5
## 49     SWE 11.16 22.82 51.69  1.99   4.09   8.81    150.4
## 50     SUI 11.34 22.88 51.32  1.98   3.97   8.60    145.5
## 51     TPE 11.22 22.56 52.74  2.08   4.38   9.63    159.5
## 52     THA 11.33 23.30 52.60  2.06   4.38  10.07    162.4
## 53     TUR 11.25 22.71 53.15  2.01   3.92   8.53    151.4
## 54     USA 10.49 21.34 48.83  1.94   3.95   8.43    141.2


# For convenience to compute, rename the row names for the country names,
# and delete this non-numeric column

rownames(t) = t$country

(t1 = t[, -1])

##        sp100 sp200 sp400 sp800 sp1500 sp3000 marathon
## ARG    11.57 22.94 52.50  2.05   4.25   9.19    150.3
## AUS    11.12 22.23 48.63  1.98   4.02   8.63    143.5
## AUT    11.15 22.70 50.62  1.94   4.05   8.78    154.3
## BEL    11.14 22.48 51.45  1.97   4.08   8.82    143.1
## BER    11.46 23.05 53.30  2.07   4.29   9.81    174.2
## BRA    11.17 22.60 50.62  1.97   4.17   9.04    147.4
## CAN    10.98 22.62 49.91  1.97   4.00   8.54    148.4
## CHI    11.65 23.84 53.68  2.00   4.22   9.26    152.2
## CHN    10.79 22.01 49.81  1.93   3.84   8.10    139.4
## COL    11.31 22.92 49.64  2.04   4.34   9.37    155.2
## COK    12.52 25.91 61.65  2.28   4.82  11.10    212.3
## CRC    11.72 23.92 52.57  2.10   4.52   9.84    164.3
## CZE    11.09 21.97 47.99  1.89   4.03   8.87    145.2
## DEN    11.42 23.36 52.92  2.02   4.12   8.71    149.3
## DOM    11.63 23.91 53.02  2.09   4.54   9.89    166.5
## FIN    11.13 22.39 50.14  2.01   4.10   8.69    148.0
## FRA    10.73 21.99 48.25  1.94   4.03   8.64    148.3
## GER    10.81 21.71 47.60  1.92   3.96   8.51    141.4
## GBR    11.10 22.10 49.43  1.94   3.97   8.37    135.2
## GRE    10.83 22.67 50.56  2.00   4.09   8.96    153.4
## GUA    11.92 24.50 55.64  2.15   4.48   9.71    171.3
## HUN    11.41 23.06 51.50  1.99   4.02   8.55    148.5
## INA    11.56 23.86 55.08  2.10   4.36   9.50    154.3
## IND    11.38 22.82 51.05  2.00   4.10   9.11    158.1
## IRL    11.43 23.02 51.07  2.01   3.98   8.36    142.2
## ISR    11.45 23.15 52.06  2.07   4.24   9.33    156.4
## ITA    11.14 22.60 51.31  1.96   3.98   8.59    143.5
## JPN    11.36 23.33 51.93  2.01   4.16   8.74    139.4
## KEN    11.62 23.37 51.56  1.97   3.96   8.39    138.5
## KOR, S 11.49 23.80 53.67  2.09   4.24   9.01    146.1
## KOR, N 11.80 25.10 56.23  1.97   4.25   8.96    145.3
## LUX    11.76 23.96 56.07  2.07   4.35   9.21    149.2
## MAS    11.50 23.37 52.56  2.12   4.39   9.31    169.3
## MRI    11.72 23.83 54.62  2.06   4.33   9.24    167.1
## MEX    11.09 23.13 48.89  2.02   4.19   8.89    144.1
## MYA    11.66 23.69 52.96  2.03   4.20   9.08    158.4
## NED    11.08 22.81 51.35  1.93   4.06   8.57    143.4
## NZL    11.32 23.13 51.60  1.97   4.10   8.76    146.5
## NOR    11.41 23.31 52.45  2.03   4.01   8.53    141.1
## PNG    11.96 24.68 55.18  2.24   4.62  10.21    221.1
## PHI    11.28 23.35 54.75  2.12   4.41   9.81    165.5
## POL    10.93 22.13 49.28  1.95   3.99   8.53    144.2
## POR    11.30 22.88 51.92  1.98   3.96   8.50    143.3
## ROM    11.30 22.35 49.88  1.92   3.90   8.36    142.5
## RUS    10.77 21.87 49.11  1.91   3.87   8.38    141.3
## SAM    12.38 25.45 56.32  2.29   5.42  13.12    191.6
## SIN    12.13 24.54 55.08  2.12   4.52   9.94    154.4
## ESP    11.06 22.38 49.67  1.96   4.01   8.48    146.5
## SWE    11.16 22.82 51.69  1.99   4.09   8.81    150.4
## SUI    11.34 22.88 51.32  1.98   3.97   8.60    145.5
## TPE    11.22 22.56 52.74  2.08   4.38   9.63    159.5
## THA    11.33 23.30 52.60  2.06   4.38  10.07    162.4
## TUR    11.25 22.71 53.15  2.01   3.92   8.53    151.4
## USA    10.49 21.34 48.83  1.94   3.95   8.43    141.2

Section 2: Compute the outliers by means of Mahalanobis distance


# compute the mean and covariance of the sample

mu = apply(t1, 2, mean)

sigma = var(t1)

# compute D^2/df

d2_f = mahalanobis(t1, mu, sigma)/ncol(t1)

# with the threshold value of 2.5, find out the outliers

d2_f[d2_f > 2.5]

##    COK KOR, N    PNG    SAM 
##  2.833  3.738  4.358  5.002

Section 3: Principal Component Analysis


# compute the P.C. using correlation matrix

r = cor(t1)

pc = princomp(covmat = r)

summary(pc, loadings = T)

## Importance of components:
##                        Comp.1  Comp.2 Comp.3  Comp.4 Comp.5   Comp.6
## Standard deviation     2.4099 0.79290 0.5285 0.35292 0.3016 0.233493
## Proportion of Variance 0.8297 0.08981 0.0399 0.01779 0.0130 0.007788
## Cumulative Proportion  0.8297 0.91947 0.9594 0.97717 0.9902 0.997957
##                          Comp.7
## Standard deviation     0.119592
## Proportion of Variance 0.002043
## Cumulative Proportion  1.000000
## 
## Loadings:
##          Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## sp100    -0.378 -0.407 -0.141  0.587 -0.167  0.540       
## sp200    -0.383 -0.414 -0.101  0.194        -0.745 -0.266
## sp400    -0.368 -0.459  0.237 -0.645  0.327  0.240  0.127
## sp800    -0.395  0.161  0.148 -0.295 -0.819        -0.195
## sp1500   -0.389  0.309 -0.422               -0.189  0.731
## sp3000   -0.376  0.423 -0.406         0.352  0.240 -0.572
## marathon -0.355  0.389  0.741  0.321  0.247


# draw a scree plot, showing that only 2 P.C. could explain more than 90%
# variance as Cumulative Proportions indicated

screeplot(pc, type = "lines", main = "Track Records")

plot of chunk unnamed-chunk-3

Section 4: Do factor analysis based on the result of PCA


# take the number of factors as 2 and compute the factor scores of countries

f = factanal(t1, factors = 2, scores = "regression")

f$scores

##          Factor1  Factor2
## ARG     0.276670 -0.14895
## AUS    -0.224167 -0.92883
## AUT    -0.337920 -0.39071
## BEL    -0.137080 -0.63053
## BER     0.829383 -0.37438
## BRA     0.239356 -0.73942
## CAN    -0.541725 -0.44670
## CHI    -0.291102  0.94909
## CHN    -0.979386 -0.85722
## COL     0.830148 -0.69673
## COK     1.423276  2.75567
## CRC     1.013748  0.41697
## CZE     0.101122 -1.41907
## DEN    -0.639001  0.61118
## DOM     1.121671  0.32885
## FIN    -0.062645 -0.76973
## FRA     0.040070 -1.48176
## GER    -0.131008 -1.64215
## GBR    -0.534375 -0.86712
## GRE     0.032584 -0.73360
## GUA     0.397437  1.48282
## HUN    -0.855659  0.37235
## INA     0.298197  0.75455
## IND    -0.063218 -0.28561
## IRL    -1.079365  0.45541
## ISR     0.304325 -0.06521
## ITA    -0.648174 -0.28658
## JPN    -0.461147  0.41907
## KEN    -1.380684  0.98454
## KOR, S -0.321350  0.92436
## KOR, N -1.200851  2.67262
## LUX    -0.092742  1.18227
## MAS     0.616755  0.07920
## MRI     0.012954  0.93026
## MEX     0.007365 -0.25234
## MYA    -0.361331  0.85949
## NED    -0.539001 -0.18516
## NZL    -0.502111  0.20442
## NOR    -1.047773  0.73911
## PNG     1.176445  1.29854
## PHI     1.032926 -0.19268
## POL    -0.310313 -1.04210
## POR    -0.965463  0.22598
## ROM    -0.933500 -0.36277
## RUS    -0.600706 -1.23497
## SAM     4.990731  0.21413
## SIN     0.579774  1.46283
## ESP    -0.455531 -0.66349
## SWE    -0.282421 -0.23348
## SUI    -0.854898  0.15586
## TPE     1.270214 -1.11232
## THA     1.180387 -0.44756
## TUR    -0.970001  0.12525
## USA     0.029111 -2.11366

## 
## Call:
## factanal(x = t1, factors = 2, scores = "regression")
## 
## Uniquenesses:
##    sp100    sp200    sp400    sp800   sp1500   sp3000 marathon 
##    0.094    0.024    0.152    0.144    0.016    0.028    0.338 
## 
## Loadings:
##          Factor1 Factor2
## sp100    0.461   0.833  
## sp200    0.455   0.877  
## sp400    0.401   0.829  
## sp800    0.732   0.566  
## sp1500   0.882   0.454  
## sp3000   0.918   0.361  
## marathon 0.693   0.427  
## 
##                Factor1 Factor2
## SS loadings      3.216   2.987
## Proportion Var   0.459   0.427
## Cumulative Var   0.459   0.886
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 31.43 on 8 degrees of freedom.
## The p-value is 0.000118

From the coefficients of factors, we can see that factor 1 represents the records of long distance tracks (no less than 800m), which is named as “endurance factor”; whereas factor 2 represents that of short ones (no more than 400m), which is named as “speed factor”.


# the score of factor 1

(sc1 = as.matrix(f$scores[, 1], ncol = 1))

##             [,1]
## ARG     0.276670
## AUS    -0.224167
## AUT    -0.337920
## BEL    -0.137080
## BER     0.829383
## BRA     0.239356
## CAN    -0.541725
## CHI    -0.291102
## CHN    -0.979386
## COL     0.830148
## COK     1.423276
## CRC     1.013748
## CZE     0.101122
## DEN    -0.639001
## DOM     1.121671
## FIN    -0.062645
## FRA     0.040070
## GER    -0.131008
## GBR    -0.534375
## GRE     0.032584
## GUA     0.397437
## HUN    -0.855659
## INA     0.298197
## IND    -0.063218
## IRL    -1.079365
## ISR     0.304325
## ITA    -0.648174
## JPN    -0.461147
## KEN    -1.380684
## KOR, S -0.321350
## KOR, N -1.200851
## LUX    -0.092742
## MAS     0.616755
## MRI     0.012954
## MEX     0.007365
## MYA    -0.361331
## NED    -0.539001
## NZL    -0.502111
## NOR    -1.047773
## PNG     1.176445
## PHI     1.032926
## POL    -0.310313
## POR    -0.965463
## ROM    -0.933500
## RUS    -0.600706
## SAM     4.990731
## SIN     0.579774
## ESP    -0.455531
## SWE    -0.282421
## SUI    -0.854898
## TPE     1.270214
## THA     1.180387
## TUR    -0.970001
## USA     0.029111


# the score of factor 2

(sc2 = as.matrix(f$scores[, 2], ncol = 1))

##            [,1]
## ARG    -0.14895
## AUS    -0.92883
## AUT    -0.39071
## BEL    -0.63053
## BER    -0.37438
## BRA    -0.73942
## CAN    -0.44670
## CHI     0.94909
## CHN    -0.85722
## COL    -0.69673
## COK     2.75567
## CRC     0.41697
## CZE    -1.41907
## DEN     0.61118
## DOM     0.32885
## FIN    -0.76973
## FRA    -1.48176
## GER    -1.64215
## GBR    -0.86712
## GRE    -0.73360
## GUA     1.48282
## HUN     0.37235
## INA     0.75455
## IND    -0.28561
## IRL     0.45541
## ISR    -0.06521
## ITA    -0.28658
## JPN     0.41907
## KEN     0.98454
## KOR, S  0.92436
## KOR, N  2.67262
## LUX     1.18227
## MAS     0.07920
## MRI     0.93026
## MEX    -0.25234
## MYA     0.85949
## NED    -0.18516
## NZL     0.20442
## NOR     0.73911
## PNG     1.29854
## PHI    -0.19268
## POL    -1.04210
## POR     0.22598
## ROM    -0.36277
## RUS    -1.23497
## SAM     0.21413
## SIN     1.46283
## ESP    -0.66349
## SWE    -0.23348
## SUI     0.15586
## TPE    -1.11232
## THA    -0.44756
## TUR     0.12525
## USA    -2.11366

Section 5: Compute the outliers according to the scores of the factors


# the outliers in terms of factor 1

mu1 = apply(sc1, 2, mean)

sigma1 = var(sc1)

d2_f1 = mahalanobis(sc1, mu1, sigma1)

d2_f1[d2_f1 > 2.5]

##   SAM 
## 25.64


# the outliers in terms of factor 2

mu2 = mean(sc2)

sigma2 = var(sc2)

d2_f2 = mahalanobis(sc2, mu2, sigma2)

d2_f2[d2_f2 > 2.5]

##    COK    GER KOR, N    USA 
##  7.932  2.817  7.461  4.666

Section 6: Repeat the factor analysis with the speed data


# transform the time unit of the track records of 800m, 1500m, 3000m, and
# marathon (42195m) from minute into second

t2 = cbind(t1[, 1:3], t1[, 4:7] * 60)

# compute the speed records of each track and each country

(t3 = t(c(100, 200, 400, 800, 1500, 3000, 42195)/t(t2)))

##        sp100 sp200 sp400 sp800 sp1500 sp3000 marathon
## ARG    8.643 8.718 7.619 6.504  5.882  5.441    4.678
## AUS    8.993 8.997 8.225 6.734  6.219  5.794    4.900
## AUT    8.969 8.811 7.902 6.873  6.173  5.695    4.556
## BEL    8.977 8.897 7.775 6.768  6.127  5.669    4.916
## BER    8.726 8.677 7.505 6.441  5.828  5.097    4.037
## BRA    8.953 8.850 7.902 6.768  5.995  5.531    4.771
## CAN    9.107 8.842 8.014 6.768  6.250  5.855    4.740
## CHI    8.584 8.389 7.452 6.667  5.924  5.400    4.620
## CHN    9.268 9.087 8.031 6.908  6.510  6.173    5.045
## COL    8.842 8.726 8.058 6.536  5.760  5.336    4.532
## COK    7.987 7.719 6.488 5.848  5.187  4.505    3.312
## CRC    8.532 8.361 7.609 6.349  5.531  5.081    4.279
## CZE    9.017 9.103 8.335 7.055  6.203  5.637    4.844
## DEN    8.757 8.562 7.559 6.601  6.068  5.741    4.709
## DOM    8.598 8.365 7.544 6.380  5.507  5.056    4.225
## FIN    8.985 8.933 7.978 6.633  6.098  5.754    4.752
## FRA    9.320 9.095 8.290 6.873  6.203  5.787    4.743
## GER    9.251 9.212 8.403 6.944  6.313  5.875    4.972
## GBR    9.009 9.050 8.092 6.873  6.297  5.974    5.200
## GRE    9.234 8.822 7.911 6.667  6.112  5.580    4.584
## GUA    8.389 8.163 7.189 6.202  5.580  5.149    4.105
## HUN    8.764 8.673 7.767 6.700  6.219  5.848    4.736
## INA    8.651 8.382 7.262 6.349  5.734  5.263    4.558
## IND    8.787 8.764 7.835 6.667  6.098  5.488    4.448
## IRL    8.749 8.688 7.832 6.633  6.281  5.981    4.944
## ISR    8.734 8.639 7.683 6.441  5.896  5.359    4.498
## ITA    8.977 8.850 7.796 6.803  6.281  5.821    4.902
## JPN    8.803 8.573 7.703 6.633  6.010  5.721    5.044
## KEN    8.606 8.558 7.758 6.768  6.313  5.959    5.079
## KOR, S 8.703 8.403 7.453 6.380  5.896  5.549    4.813
## KOR, N 8.475 7.968 7.114 6.768  5.882  5.580    4.840
## LUX    8.503 8.347 7.134 6.441  5.747  5.429    4.713
## MAS    8.696 8.558 7.610 6.289  5.695  5.371    4.154
## MRI    8.532 8.393 7.323 6.472  5.774  5.411    4.209
## MEX    9.017 8.647 8.182 6.601  5.967  5.624    4.882
## MYA    8.576 8.442 7.553 6.568  5.952  5.507    4.439
## NED    9.025 8.768 7.790 6.908  6.158  5.834    4.903
## NZL    8.834 8.647 7.752 6.768  6.098  5.708    4.802
## NOR    8.764 8.580 7.626 6.568  6.234  5.862    4.985
## PNG    8.361 8.104 7.249 5.952  5.411  4.897    3.180
## PHI    8.865 8.565 7.306 6.289  5.669  5.097    4.250
## POL    9.149 9.038 8.117 6.838  6.266  5.862    4.878
## POR    8.850 8.741 7.704 6.734  6.313  5.882    4.908
## ROM    8.850 8.949 8.019 6.944  6.410  5.981    4.935
## RUS    9.285 9.145 8.145 6.981  6.460  5.967    4.977
## SAM    8.078 7.859 7.102 5.822  4.613  3.811    3.671
## SIN    8.244 8.150 7.262 6.289  5.531  5.030    4.554
## ESP    9.042 8.937 8.053 6.803  6.234  5.896    4.800
## SWE    8.961 8.764 7.738 6.700  6.112  5.675    4.676
## SUI    8.818 8.741 7.794 6.734  6.297  5.814    4.833
## TPE    8.913 8.865 7.584 6.410  5.708  5.192    4.408
## THA    8.826 8.584 7.605 6.472  5.708  4.965    4.331
## TUR    8.889 8.807 7.526 6.633  6.378  5.862    4.644
## USA    9.533 9.372 8.192 6.873  6.329  5.931    4.982




## Compute the outliers by means of Mahalanobis distance

# compute the mean and covariance of the sample

mu = apply(t3, 2, mean)

sigma = var(t3)

# compute D^2/df

d2_f_speed = mahalanobis(t3, mu, sigma)/ncol(t3)

# with the threshold value of 2.5, find out the outliers

d2_f_speed[d2_f_speed > 2.5]

## KOR, N    PNG    SAM 
##  3.488  3.347  3.453




## Principal Component Analysis

# compute the P.C. using correlation matrix

r = cor(t3)

pc = princomp(covmat = r)

summary(pc, loadings = T)

## Importance of components:
##                        Comp.1  Comp.2  Comp.3  Comp.4  Comp.5   Comp.6
## Standard deviation     2.4150 0.80477 0.46868 0.35913 0.31580 0.237190
## Proportion of Variance 0.8332 0.09252 0.03138 0.01843 0.01425 0.008037
## Cumulative Proportion  0.8332 0.92571 0.95709 0.97551 0.98976 0.997796
##                          Comp.7
## Standard deviation     0.124216
## Proportion of Variance 0.002204
## Cumulative Proportion  1.000000
## 
## Loadings:
##          Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## sp100    -0.374 -0.425  0.177  0.568  0.210  0.521       
## sp200    -0.381 -0.417  0.117  0.159        -0.753 -0.267
## sp400    -0.367 -0.438 -0.371 -0.549 -0.400  0.243  0.123
## sp800    -0.393  0.152 -0.143 -0.374  0.796        -0.170
## sp1500   -0.390  0.282  0.463 -0.136        -0.161  0.710
## sp3000   -0.381  0.397  0.369        -0.361  0.257 -0.600
## marathon -0.359  0.440 -0.668  0.432 -0.154 -0.101  0.108


# again, only 2 P.C. could explain more than 90% variance as Cumulative
# Proportions indicated




## Do factor analysis based on the result of PCA

# take the number of factors as 2 and compute the factor scores of countries

f = factanal(t3, factors = 2, scores = "regression")

f$scores

##          Factor1  Factor2
## ARG    -0.383008  0.21658
## AUS     0.246998  0.94961
## AUT     0.290101  0.38574
## BEL     0.087753  0.66858
## BER    -1.227208  0.56384
## BRA    -0.318517  0.76111
## CAN     0.640683  0.35819
## CHI     0.123145 -0.91366
## CHN     1.275292  0.76966
## COL    -0.874603  0.66085
## COK    -1.653836 -2.34956
## CRC    -1.063671 -0.45031
## CZE    -0.259961  1.56753
## DEN     0.724909 -0.69694
## DOM    -1.161628 -0.37558
## FIN     0.125996  0.75662
## FRA    -0.015501  1.50451
## GER     0.159719  1.73199
## GBR     0.717991  0.83794
## GRE    -0.140424  0.72745
## GUA    -0.509667 -1.41105
## HUN     0.949878 -0.44796
## INA    -0.423244 -0.72714
## IND    -0.172006  0.39028
## IRL     1.298979 -0.58831
## ISR    -0.487463  0.12313
## ITA     0.667499  0.27423
## JPN     0.610107 -0.56050
## KEN     1.560077 -1.10670
## KOR, S  0.354823 -0.99820
## KOR, N  1.225658 -2.68620
## LUX     0.094613 -1.17484
## MAS    -0.597696 -0.12140
## MRI    -0.062499 -0.91716
## MEX     0.101983  0.08357
## MYA     0.268492 -0.85219
## NED     0.690745  0.06258
## NZL     0.545995 -0.28144
## NOR     1.162568 -0.83814
## PNG    -1.298847 -1.20340
## PHI    -1.240038  0.25531
## POL     0.377503  1.04530
## POR     1.036459 -0.27500
## ROM     1.041750  0.35593
## RUS     0.656423  1.28687
## SAM    -4.108181 -0.60278
## SIN    -0.725533 -1.36453
## ESP     0.594220  0.60169
## SWE     0.260744  0.21293
## SUI     0.854736 -0.17251
## TPE    -1.452538  1.22594
## THA    -1.512813  0.53449
## TUR     0.933237 -0.07784
## USA     0.009806  2.28087

## 
## Call:
## factanal(x = t3, factors = 2, scores = "regression")
## 
## Uniquenesses:
##    sp100    sp200    sp400    sp800   sp1500   sp3000 marathon 
##    0.104    0.017    0.164    0.150    0.029    0.016    0.263 
## 
## Loadings:
##          Factor1 Factor2
## sp100    0.447   0.834  
## sp200    0.442   0.887  
## sp400    0.418   0.813  
## sp800    0.731   0.562  
## sp1500   0.863   0.475  
## sp3000   0.917   0.379  
## marathon 0.768   0.383  
## 
##                Factor1 Factor2
## SS loadings      3.281   2.975
## Proportion Var   0.469   0.425
## Cumulative Var   0.469   0.894
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 27.13 on 8 degrees of freedom.
## The p-value is 0.00067


# Factor 1 and 2 represents 'endurance factor' and 'speed factor', same as
# the above.

I prefer the second approach, since it is of more clear and reasonable meanings. Therefore, I'm convinced by the result of the second, which is different from the first (with KOR, N, PNG and SAM, but without COK), as the more precise one. More importantly, the variables of transformed data share the same order of magnitude, which will lead to the same effect as a standardized sample. However, various magnitude of variables in the raw data will lead to unbalanced variances among them.