Problem 1

Load the data

women<-read.table(file="Data-HW4-track-women.dat", header=FALSE, quote="", sep="\t")
rforw<-data.matrix(women[,2:8])

center = function(v){v - mean(v)}
x = apply(rforw, 2, center)

y = women$V1; y

##  [1] ARG    AUS    AUT    BEL    BER    BRA    CAN    CHI    CHN    COL   
## [11] COK    CRC    CZE    DEN    DOM    FIN    FRA    GER    GBR    GRE   
## [21] GUA    HUN    INA    IND    IRL    ISR    ITA    JPN    KEN    KOR, S
## [31] KOR, N LUX    MAS    MRI    MEX    MYA    NED    NZL    NOR    PNG   
## [41] PHI    POL    POR    ROM    RUS    SAM    SIN    ESP    SWE    SUI   
## [51] TPE    THA    TUR    USA   
## 54 Levels: ARG AUS AUT BEL BER BRA CAN CHI CHN COK COL CRC CZE DEN ... USA

Question (a)

cor(rforw)

##           V2        V3        V4        V5        V6        V7        V8
## V2 1.0000000 0.9410886 0.8707802 0.8091758 0.7815510 0.7278784 0.6689597
## V3 0.9410886 1.0000000 0.9088096 0.8198258 0.8013282 0.7318546 0.6799537
## V4 0.8707802 0.9088096 1.0000000 0.8057904 0.7197996 0.6737991 0.6769384
## V5 0.8091758 0.8198258 0.8057904 1.0000000 0.9050509 0.8665732 0.8539900
## V6 0.7815510 0.8013282 0.7197996 0.9050509 1.0000000 0.9733801 0.7905565
## V7 0.7278784 0.7318546 0.6737991 0.8665732 0.9733801 1.0000000 0.7987302
## V8 0.6689597 0.6799537 0.6769384 0.8539900 0.7905565 0.7987302 1.0000000

eigen(cor(rforw))

## eigen() decomposition
## $values
## [1] 5.80762446 0.62869342 0.27933457 0.12455472 0.09097174 0.05451882
## [7] 0.01430226
## 
## $vectors
##            [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
## [1,] -0.3777657 -0.4071756 -0.1405803  0.58706293 -0.16706891  0.53969730
## [2,] -0.3832103 -0.4136291 -0.1007833  0.19407501  0.09350016 -0.74493139
## [3,] -0.3680361 -0.4593531  0.2370255 -0.64543118  0.32727328  0.24009405
## [4,] -0.3947810  0.1612459  0.1475424 -0.29520804 -0.81905467 -0.01650651
## [5,] -0.3892610  0.3090877 -0.4219855 -0.06669044  0.02613100 -0.18898771
## [6,] -0.3760945  0.4231899 -0.4060627 -0.08015699  0.35169796  0.24049968
## [7,] -0.3552031  0.3892153  0.7410610  0.32107640  0.24700821 -0.04826992
##             [,7]
## [1,]  0.08893934
## [2,] -0.26565662
## [3,]  0.12660435
## [4,] -0.19521315
## [5,]  0.73076817
## [6,] -0.57150644
## [7,]  0.08208401

Question (b)

women.pca = prcomp(rforw, center=TRUE, scale.=TRUE)
women.pca$rotation[,1:2]

##           PC1        PC2
## V2 -0.3777657  0.4071756
## V3 -0.3832103  0.4136291
## V4 -0.3680361  0.4593531
## V5 -0.3947810 -0.1612459
## V6 -0.3892610 -0.3090877
## V7 -0.3760945 -0.4231899
## V8 -0.3552031 -0.3892153

summary(women.pca)

## Importance of components:
##                           PC1     PC2    PC3     PC4    PC5     PC6
## Standard deviation     2.4099 0.79290 0.5285 0.35292 0.3016 0.23349
## Proportion of Variance 0.8297 0.08981 0.0399 0.01779 0.0130 0.00779
## Cumulative Proportion  0.8297 0.91947 0.9594 0.97717 0.9902 0.99796
##                            PC7
## Standard deviation     0.11959
## Proportion of Variance 0.00204
## Cumulative Proportion  1.00000

Question (c)

The first PC explains 83% of the total sample variation and the second PC explains 9% and the total sample variation after the second PC is 92%.

Question (d)

pca2<- svd(x); pca2

## $d
## [1] 120.5869891  14.5905502   3.7706286   2.4639297   0.8953146   0.3708287
## [7]   0.1808009
## 
## $u
##               [,1]         [,2]          [,3]          [,4]          [,5]
##  [1,]  0.026681074  0.058604086  0.0160996392  0.1176296017  0.2858444484
##  [2,]  0.086684247 -0.156200588  0.0678554520 -0.0605078062  0.1258865787
##  [3,] -0.004511426 -0.106016357 -0.0536623468 -0.0372799730 -0.0366487883
##  [4,]  0.087797994  0.035221002 -0.0243959315  0.1687955331  0.0369066426
##  [5,] -0.170626202 -0.079397016 -0.0768413253  0.1968639016  0.1168377763
##  [6,]  0.052533645 -0.049622456  0.0590101849  0.0734581731  0.0128180870
##  [7,]  0.045551129 -0.107240610 -0.0354972319 -0.0831523595 -0.1740961750
##  [8,]  0.009590292  0.137750213  0.0390375236 -0.0633315612  0.0117992387
##  [9,]  0.119845874 -0.057093597 -0.1379086112  0.0735775676 -0.1144074052
## [10,] -0.010868733 -0.166252257  0.1802048561 -0.1040704997  0.0183994250
## [11,] -0.493604209  0.220825513 -0.2153879593  0.0654359145  0.0261872148
## [12,] -0.089259474 -0.026037615  0.1703296497 -0.1334850145  0.0277725979
## [13,]  0.073441302 -0.215215218  0.1175412448  0.0041451489  0.1936709173
## [14,]  0.034428655  0.097833810 -0.0806739674 -0.0509857932 -0.0230765927
## [15,] -0.107194174 -0.015031543  0.1335368333 -0.0745387120 -0.0649324984
## [16,]  0.048300493 -0.091818590 -0.0223868838  0.0284936924  0.0645130804
## [17,]  0.047976209 -0.227197023  0.0123737997  0.0094275018 -0.1864448552
## [18,]  0.104814586 -0.219320106  0.0470273466  0.0278069224  0.0150761800
## [19,]  0.154083835 -0.041810440  0.0105880627  0.0428668220  0.1612381942
## [20,]  0.003347425 -0.104131997 -0.0170239727  0.0440673899 -0.3617845890
## [21,] -0.149780138  0.126797305 -0.0319172643 -0.1012968217  0.0069172066
## [22,]  0.042772909  0.006150549 -0.0629993255 -0.1203914516  0.0959999254
## [23,] -0.008706477  0.212450673  0.0006592577  0.1180808719 -0.1032274783
## [24,] -0.035942567 -0.102729048 -0.0094755354 -0.0119927261  0.1485681808
## [25,]  0.094834210  0.027792179 -0.0348195690 -0.1637164451  0.1387734040
## [26,] -0.022730502 -0.014883484  0.0404121481  0.0240382007  0.0778907485
## [27,]  0.084514672  0.023686687 -0.0651564887  0.0681189823 -0.0007182163
## [28,]  0.117039584  0.114180378  0.0617737085 -0.0833903159 -0.0754084278
## [29,]  0.125196551  0.098455055  0.0155063579 -0.2317099766  0.1980060738
## [30,]  0.059993977  0.183576164  0.0241942094 -0.0549068265 -0.1361949206
## [31,]  0.063933305  0.382784812  0.0136475922 -0.2699901613 -0.3339393465
## [32,]  0.032089810  0.319266862 -0.0636353269  0.1221919436  0.0673847629
## [33,] -0.129595520 -0.082821009 -0.0615853732 -0.0556052065  0.0317052085
## [34,] -0.113558871  0.077808936 -0.1302642671 -0.0374781021  0.0799074570
## [35,]  0.081535862 -0.124416356  0.2087816837 -0.2696980615 -0.2766266101
## [36,] -0.040601946  0.037148665 -0.0234281538 -0.1452974864  0.0863845594
## [37,]  0.084747805  0.030356021 -0.0474734833  0.0098227907 -0.1526558890
## [38,]  0.059385256  0.031188200  0.0023657619 -0.0721997824 -0.0340316077
## [39,]  0.103079554  0.132701237 -0.0392404327 -0.0834341992 -0.0025409953
## [40,] -0.559541583 -0.301261838 -0.2623896301 -0.3077521988 -0.0373154934
## [41,] -0.100409912  0.089589388 -0.0622731866  0.2998593951 -0.1994262143
## [42,]  0.080683810 -0.123975550 -0.0199373127  0.0343475119 -0.0306890029
## [43,]  0.085370493  0.070701786 -0.0852591863  0.0008402203  0.0569766249
## [44,]  0.093920108 -0.065585095 -0.0470451769 -0.0508736132  0.2747242615
## [45,]  0.104623720 -0.119124214 -0.0628541616  0.1000515327 -0.0843091150
## [46,] -0.318631602  0.049073485  0.7058290292  0.2009564677  0.0055255176
## [47,] -0.010150428  0.232284965  0.2045382779 -0.0806577932  0.2052251021
## [48,]  0.061076011 -0.112071295 -0.0398825410 -0.0422698095  0.0043298671
## [49,]  0.027066351 -0.002183049 -0.0631694884  0.0490897531 -0.0776689004
## [50,]  0.067597459  0.014917039 -0.0425139045 -0.0513640391  0.0977089412
## [51,] -0.049320772 -0.008530645 -0.0203961507  0.3558819134  0.0573359338
## [52,] -0.073128174 -0.022869000  0.1430998265  0.1632642758 -0.1257213944
## [53,]  0.017334836  0.079859480 -0.2398123514  0.1526693467  0.0732902322
## [54,]  0.106289666 -0.148168495 -0.0951059057  0.2895953612 -0.1717398730
##               [,6]         [,7]
##  [1,]  0.159727266 -0.003108278
##  [2,]  0.024384038  0.132806560
##  [3,] -0.141403656 -0.213422823
##  [4,]  0.009530014 -0.083486030
##  [5,] -0.221484457  0.034487063
##  [6,]  0.022860490 -0.162806753
##  [7,] -0.053009375  0.058860471
##  [8,] -0.194766548 -0.125328078
##  [9,] -0.023636503  0.034877310
## [10,]  0.173930303 -0.014975720
## [11,] -0.174151693 -0.011357393
## [12,]  0.159260583 -0.051276333
## [13,] -0.170224479 -0.271401391
## [14,]  0.066769215  0.001661770
## [15,]  0.161446536 -0.154368187
## [16,]  0.183468834  0.045762244
## [17,]  0.017868424 -0.082002258
## [18,] -0.008484552 -0.051589157
## [19,]  0.102306739 -0.056694193
## [20,] -0.126125182  0.141859090
## [21,]  0.125704918  0.062288659
## [22,] -0.040989600  0.031819909
## [23,]  0.076939721  0.114850089
## [24,] -0.208935574  0.041038745
## [25,]  0.032591823  0.213039462
## [26,] -0.013619339  0.196886324
## [27,] -0.104165644  0.004459760
## [28,]  0.132976674  0.028543536
## [29,] -0.131306458  0.090174249
## [30,]  0.130813705  0.273936758
## [31,] -0.164867343 -0.362395224
## [32,]  0.241795359 -0.157015163
## [33,]  0.351328482  0.021022913
## [34,]  0.141735462 -0.248911741
## [35,]  0.083425778  0.215804659
## [36,] -0.068777889 -0.040803930
## [37,]  0.020047891 -0.279546379
## [38,] -0.037560158 -0.114872966
## [39,] -0.040995173  0.302556282
## [40,]  0.076650305  0.030586273
## [41,]  0.071154937  0.119660155
## [42,]  0.013517768 -0.023102329
## [43,] -0.110836950  0.098087070
## [44,] -0.125220256 -0.103525963
## [45,] -0.155631808 -0.018438188
## [46,] -0.133413854  0.017334539
## [47,]  0.046037573  0.067933846
## [48,]  0.061363484 -0.057149295
## [49,] -0.021283547 -0.034364279
## [50,] -0.168238355  0.122504190
## [51,]  0.239743422 -0.042582908
## [52,] -0.260611108  0.103583011
## [53,] -0.155223022  0.186648955
## [54,]  0.127582782 -0.028548934
## 
## $v
##              [,1]        [,2]        [,3]         [,4]          [,5]
## [1,] -0.016123307  0.11485619  0.17348957 -0.292124494  9.325622e-01
## [2,] -0.038657909  0.29039299  0.38670581 -0.794578949 -3.543286e-01
## [3,] -0.107793074  0.93844399 -0.22555878  0.238107150 -1.355429e-03
## [4,] -0.004504024  0.01340703  0.03608601  0.009212813  1.547251e-02
## [5,] -0.013072642  0.03631915  0.26812163  0.068410358 -6.671958e-02
## [6,] -0.039484872  0.07871002  0.83389231  0.470904246 -9.287424e-03
## [7,] -0.992409201 -0.11878027 -0.03025618 -0.009843642 -2.323022e-05
##               [,6]         [,7]
## [1,]  0.0226814395 -0.030430700
## [2,] -0.0832625101  0.028266709
## [3,]  0.0055727214 -0.009943020
## [4,]  0.3768162267  0.925301584
## [5,]  0.8833633010 -0.370349317
## [6,] -0.2649535334  0.069511587
## [7,] -0.0005351329 -0.001613339

#ranking

women[order(women.pca$x[,1], decreasing = TRUE),1]

##  [1] USA    GER    RUS    CHN    FRA    GBR    CZE    POL    ROM    AUS   
## [11] ESP    CAN    ITA    NED    BEL    FIN    AUT    GRE    POR    SUI   
## [21] IRL    BRA    MEX    KEN    TUR    SWE    HUN    NZL    NOR    JPN   
## [31] IND    DEN    COL    ARG    ISR    TPE    CHI    MYA    KOR, S THA   
## [41] BER    KOR, N MAS    LUX    INA    MRI    PHI    CRC    DOM    SIN   
## [51] GUA    PNG    COK    SAM   
## 54 Levels: ARG AUS AUT BEL BER BRA CAN CHI CHN COK COL CRC CZE DEN ... USA

#My intuition tells me that the larger more developed countries with plenty of resources are leading the ranks. Such is the case for the row [1] and row [8], these are highly developed countries. It starts to go down for second tier countries, still developed but less so that the first two rows.

Question (e)

women2<-women
women2[,5:8]<-(women2[,5:8])*60
women2[,2]<-100/(women2[,2])
women2[,3]<-200/(women2[,3])
women2[,4]<-400/(women2[,4])
women2[,5]<-800/(women2[,5])
women2[,6]<-1500/(women2[,6])
women2[,7]<-3000/(women2[,7])
women2[,8]<-42195/(women2[,8])

#PCA Analysis

women.pca2 = prcomp(rforw, center=TRUE)
women.pca2$rotation[,1:2]

##             PC1         PC2
## V2 -0.016123307  0.11485619
## V3 -0.038657909  0.29039299
## V4 -0.107793074  0.93844399
## V5 -0.004504024  0.01340703
## V6 -0.013072642  0.03631915
## V7 -0.039484872  0.07871002
## V8 -0.992409201 -0.11878027

summary(women.pca2)

## Importance of components:
##                            PC1     PC2     PC3     PC4     PC5     PC6
## Standard deviation     16.5639 2.00417 0.51794 0.33845 0.12298 0.05094
## Proportion of Variance  0.9841 0.01441 0.00096 0.00041 0.00005 0.00001
## Cumulative Proportion   0.9841 0.99856 0.99952 0.99993 0.99999 1.00000
##                            PC7
## Standard deviation     0.02483
## Proportion of Variance 0.00000
## Cumulative Proportion  1.00000

rforw2<-data.matrix(women2[,2:8])
women2.pca = prcomp(rforw2, center=TRUE)
women2.pca$rotation[,1:2]

##          PC1         PC2
## V2 0.3102442 -0.37596510
## V3 0.3573948 -0.43376925
## V4 0.3787367 -0.51873227
## V5 0.2993405  0.05313551
## V6 0.3912131  0.21084397
## V7 0.4595909  0.39557338
## V8 0.4227291  0.44458346

summary(women2.pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     0.8557 0.2934 0.18270 0.12238 0.09408 0.07853
## Proportion of Variance 0.8285 0.0974 0.03777 0.01695 0.01002 0.00698
## Cumulative Proportion  0.8285 0.9259 0.96372 0.98067 0.99068 0.99766
##                            PC7
## Standard deviation     0.04545
## Proportion of Variance 0.00234
## Cumulative Proportion  1.00000

women2[order(women2.pca$x[,1], decreasing = TRUE),1]

##  [1] USA    CHN    RUS    GER    GBR    FRA    ROM    POL    CZE    AUS   
## [11] ESP    CAN    ITA    NED    IRL    POR    KEN    FIN    BEL    SUI   
## [21] MEX    AUT    GRE    TUR    HUN    NOR    BRA    NZL    SWE    JPN   
## [31] DEN    IND    COL    ARG    KOR, S ISR    MYA    CHI    TPE    KOR, N
## [41] LUX    MAS    THA    INA    BER    MRI    PHI    CRC    DOM    SIN   
## [51] GUA    PNG    COK    SAM   
## 54 Levels: ARG AUS AUT BEL BER BRA CAN CHI CHN COK COL CRC CZE DEN ... USA

#The components between the standardized predictors and the converted measures are very similar. The numbers for PC1 in the standardized predictors (problem b) do not change that much (.8297) in comparison to PC1 in the converted speed measures (.8258). I prefer the converted measures anaylsis because it captures 93% of the variation after the second principal component compared to the standardized principal component at 92%.

Question (f)

Load the data

men<-read.table(file="Data-HW4-track-men.dat", header=FALSE, quote="", sep="")
rform<-data.matrix(men[,2:9])

centermen = function(v){v - mean(v)}
x = apply(rform, 2, center)

(f)-(a)

cor(rform)

##           V2        V3        V4        V5        V6        V7        V8
## V2 1.0000000 0.9147554 0.8041147 0.7119388 0.7657919 0.7398803 0.7147921
## V3 0.9147554 1.0000000 0.8449159 0.7969162 0.7950871 0.7613028 0.7479519
## V4 0.8041147 0.8449159 1.0000000 0.7677488 0.7715522 0.7796929 0.7657481
## V5 0.7119388 0.7969162 0.7677488 1.0000000 0.8957609 0.8606959 0.8431074
## V6 0.7657919 0.7950871 0.7715522 0.8957609 1.0000000 0.9165224 0.9013380
## V7 0.7398803 0.7613028 0.7796929 0.8606959 0.9165224 1.0000000 0.9882324
## V8 0.7147921 0.7479519 0.7657481 0.8431074 0.9013380 0.9882324 1.0000000
## V9 0.6764873 0.7211157 0.7126823 0.8069657 0.8777788 0.9441466 0.9541630
##           V9
## V2 0.6764873
## V3 0.7211157
## V4 0.7126823
## V5 0.8069657
## V6 0.8777788
## V7 0.9441466
## V8 0.9541630
## V9 1.0000000

eigen(cor(rform))

## eigen() decomposition
## $values
## [1] 6.703289951 0.638410110 0.227524494 0.205849181 0.097577441 0.070687912
## [7] 0.046942050 0.009718862
## 
## $vectors
##            [,1]        [,2]         [,3]        [,4]        [,5]
## [1,] -0.3323877 -0.52939911 -0.343859303  0.38074525  0.29967117
## [2,] -0.3460511 -0.47039050  0.003786104  0.21702322 -0.54143422
## [3,] -0.3391240 -0.34532929  0.067060507 -0.85129980  0.13298631
## [4,] -0.3530134  0.08945523  0.782711152  0.13427911 -0.22728254
## [5,] -0.3659849  0.15365241  0.244270040  0.23302034  0.65162403
## [6,] -0.3698204  0.29475985 -0.182863147 -0.05462441  0.07181636
## [7,] -0.3659489  0.33360619 -0.243980694 -0.08706927 -0.06133263
## [8,] -0.3542779  0.38656085 -0.334632969  0.01812115 -0.33789097
##             [,6]       [,7]         [,8]
## [1,] -0.36203713  0.3476470 -0.065701445
## [2,]  0.34859224 -0.4398969  0.060755403
## [3,]  0.07708385  0.1135553 -0.003469726
## [4,] -0.34130845  0.2588830 -0.039274027
## [5,]  0.52977961 -0.1470362 -0.039745509
## [6,] -0.35914382 -0.3283202  0.705684585
## [7,] -0.27308617 -0.3511133 -0.697181715
## [8,]  0.37516986  0.5941571  0.069316891

(f)-(b)

men.pca = prcomp(rform, center=TRUE, scale.=TRUE)
men.pca$rotation[,1:2]

##           PC1         PC2
## V2 -0.3323877 -0.52939911
## V3 -0.3460511 -0.47039050
## V4 -0.3391240 -0.34532929
## V5 -0.3530134  0.08945523
## V6 -0.3659849  0.15365241
## V7 -0.3698204  0.29475985
## V8 -0.3659489  0.33360619
## V9 -0.3542779  0.38656085

summary(men.pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4    PC5     PC6
## Standard deviation     2.5891 0.7990 0.47700 0.45371 0.3124 0.26587
## Proportion of Variance 0.8379 0.0798 0.02844 0.02573 0.0122 0.00884
## Cumulative Proportion  0.8379 0.9177 0.94615 0.97188 0.9841 0.99292
##                            PC7     PC8
## Standard deviation     0.21666 0.09858
## Proportion of Variance 0.00587 0.00121
## Cumulative Proportion  0.99879 1.00000

(f)-(c)

The first PC (PC1) spans the direction where most of the variation is, in this case 84%. While the second PC (PC2) captures the direction where the 2nd most variation is at 8%. The total sample variation, given PC1 and PC2 is 92%.

(f)-(d)

men[order(men.pca$x[,1], decreasing = TRUE),1]

##  [1] U.S.A.         GreatBritain   Kenya          France        
##  [5] Australia      Italy          Brazil         Germany       
##  [9] Portugal       Canada         Belgium        Poland        
## [13] Russia         Spain          Japan          Switzerland   
## [17] Norway         Netherlands    Mexico         NewZealand    
## [21] Denmark        Greece         Hungary        Finland       
## [25] Ireland        Sweden         Austria        Chile         
## [29] China          CzechRepublic  Romania        Argentina     
## [33] Korea,South    India          Columbia       Turkey        
## [37] Israel         Mauritius      Luxembourg     Taiwan        
## [41] DominicanRepub Bermuda        Thailand       Indonesia     
## [45] CostaRica      Korea,North    Malaysia       Guatemala     
## [49] Philippines    Myanmar(Burma) PapuaNewGuinea Singapore     
## [53] Samoa          CookIslands   
## 54 Levels: Argentina Australia Austria Belgium Bermuda Brazil ... U.S.A.

#My intiution tells me that the highly developed countries lead the pack. I also happen to know that Kenyan always win marathons. So, even though my intuition leans to wealthy countries, Kenya has a long history of producing fast runners. So, this does not surprise me and it makes sense to me. ### (f)-(e)

men2<-men
men2[,5:9]<-(men2[,5:9])*60
men2[,2]<-100/(men2[,2])
men2[,3]<-200/(men2[,3])
men2[,4]<-400/(men2[,4])
men2[,5]<-800/(men2[,5])
men2[,6]<-1500/(men2[,6])
men2[,7]<-5000/(men2[,7])
men2[,8]<-10000/(men2[,8])
men2[,9]<-42195/(men2[,9])

rform2<-data.matrix(men2[,2:9])
men2.pca = prcomp(rform2, center=TRUE)
men2.pca$rotation[,1:2]

##          PC1         PC2
## V2 0.2439701  0.43237108
## V3 0.3113827  0.52345617
## V4 0.3168151  0.46905827
## V5 0.2775048  0.03280175
## V6 0.3642621 -0.06284374
## V7 0.4276861 -0.26134677
## V8 0.4209180 -0.30988613
## V9 0.4163706 -0.38688033

summary(men2.pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     0.7029 0.2150 0.11795 0.11542 0.08673 0.07582
## Proportion of Variance 0.8444 0.0790 0.02378 0.02277 0.01286 0.00983
## Cumulative Proportion  0.8444 0.9234 0.94713 0.96990 0.98276 0.99258
##                            PC7     PC8
## Standard deviation     0.05675 0.03348
## Proportion of Variance 0.00550 0.00192
## Cumulative Proportion  0.99808 1.00000

men2[order(men2.pca$x[,1], decreasing = TRUE),1]

##  [1] U.S.A.         Kenya          GreatBritain   France        
##  [5] Australia      Italy          Belgium        Germany       
##  [9] Portugal       Brazil         Spain          Canada        
## [13] Russia         Poland         Japan          Switzerland   
## [17] Netherlands    Mexico         Norway         NewZealand    
## [21] Denmark        Ireland        Finland        Greece        
## [25] Hungary        Sweden         Austria        China         
## [29] Romania        Chile          CzechRepublic  Argentina     
## [33] Korea,South    India          Turkey         Columbia      
## [37] Israel         Luxembourg     Mauritius      Taiwan        
## [41] Korea,North    CostaRica      Thailand       Guatemala     
## [45] DominicanRepub Philippines    Indonesia      Bermuda       
## [49] Malaysia       Myanmar(Burma) Singapore      PapuaNewGuinea
## [53] Samoa          CookIslands   
## 54 Levels: Argentina Australia Austria Belgium Bermuda Brazil ... U.S.A.

Problem 2

Load the data

pollution<-read.table(file="Data-HW4-pollution.dat", header=FALSE, quote="", sep="")
names(pollution) = c("wind","solar","co","no","no2","o3","hc")

Question (a)

cov(pollution)

##             wind       solar         co         no        no2         o3
## wind   2.5000000  -2.7804878 -0.3780488 -0.4634146 -0.5853659 -2.2317073
## solar -2.7804878 300.5156794  3.9094077 -1.3867596  6.7630662 30.7909408
## co    -0.3780488   3.9094077  1.5220674  0.6736353  2.3147503  2.8217189
## no    -0.4634146  -1.3867596  0.6736353  1.1823461  1.0882695 -0.8106852
## no2   -0.5853659   6.7630662  2.3147503  1.0882695 11.3635308  3.1265970
## o3    -2.2317073  30.7909408  2.8217189 -0.8106852  3.1265970 30.9785134
## hc     0.1707317   0.6236934  0.1416957  0.1765389  1.0441347  0.5946574
##              hc
## wind  0.1707317
## solar 0.6236934
## co    0.1416957
## no    0.1765389
## no2   1.0441347
## o3    0.5946574
## hc    0.4785134

Question (b)

# center columns
center = function(v){v - mean(v)}
Xc = apply(pollution, 2, center)

# factor loading by principal component solution
fit.pca_pollution = eigen(cor(Xc))

#eigenvectors
v = fit.pca_pollution$vectors
rownames(v) = colnames(pollution)

#factor loadings
L1 = v[,1] * sqrt(fit.pca_pollution$values[1])
L2 = v[,2] * sqrt(fit.pca_pollution$values[2])

round(L1,3)

##   wind  solar     co     no    no2     o3     hc 
##  0.362 -0.314 -0.842 -0.577 -0.761 -0.496 -0.488

round(L2,3)

##   wind  solar     co     no    no2     o3     hc 
##  0.328 -0.620 -0.008  0.512  0.235 -0.667  0.362

# commonalities
# m = 1
round(L1^2, 3)

##  wind solar    co    no   no2    o3    hc 
## 0.131 0.099 0.710 0.333 0.580 0.246 0.238

# m = 2
round(L1^2 + L2^2, 3)

##  wind solar    co    no   no2    o3    hc 
## 0.239 0.483 0.710 0.595 0.635 0.692 0.370

Question (c)

sum(L1^2) / length(L1)

## [1] 0.3338261

sum(L1^2) / length(L1) + sum(L2^2) / length(L1)

## [1] 0.5318262

Question (d)

varimax(cbind(L1, L2), normalize = FALSE)

## $loadings
## 
## Loadings:
##       L1     L2    
## wind   0.160  0.461
## solar        -0.695
## co    -0.735 -0.412
## no    -0.752  0.171
## no2   -0.781 -0.160
## o3    -0.114 -0.824
## hc    -0.602       
## 
##                   L1    L2
## SS loadings    2.117 1.606
## Proportion Var 0.302 0.229
## Cumulative Var 0.302 0.532
## 
## $rotmat
##            [,1]      [,2]
## [1,]  0.8768458 0.4807718
## [2,] -0.4807718 0.8768458

##Interpretion of varimax ##L1 #co, no, and no2 items load relatively high to component L1 (absolute values: .735, .752 & .781, respectively). I would say that hc also loads relatively high to component L1 all of which are in contrast to wind which loads to .160 to component L1. ##L2 #item o3 loads the highest on the component L2. Yet, wind (.461), solar (.695) and co(.412) also load high on the component L1, all of which in contrast to no(.171) and no2(-.160)

Homework Set 4

PH245: Introduction to multivariate statistics

Sergio Martinez

December 9, 2019

Problem 1

Load the data

Question (a)

Question (b)

Question (c)

The first PC explains 83% of the total sample variation and the second PC explains 9% and the total sample variation after the second PC is 92%.

Question (d)

Question (e)

Question (f)

Load the data

(f)-(a)

(f)-(b)

(f)-(c)

The first PC (PC1) spans the direction where most of the variation is, in this case 84%. While the second PC (PC2) captures the direction where the 2nd most variation is at 8%. The total sample variation, given PC1 and PC2 is 92%.

(f)-(d)

Problem 2

Load the data

Question (a)

Question (b)

Question (c)

Question (d)