We first load and look at the data.

##           V1         V2         V3         V4         V5
## 1  0.0130338 -0.0078431 -0.0031889 -0.0447693  0.0052151
## 2  0.0084862  0.0166886 -0.0062100  0.0119560  0.0134890
## 3 -0.0179153 -0.0086393  0.0100360  0.0000000 -0.0061428
## 4  0.0215589 -0.0034858  0.0174353 -0.0285917 -0.0069534
## 5  0.0108225  0.0037167 -0.0101345  0.0291900  0.0409751
## 6  0.0101713 -0.0121978 -0.0083768  0.0137083  0.0029895

0. Contents

This report contains four parts:

1. PC method - We perform factor analysis manually using covariance and correlation matrix and compute loadings and the psi matrix and interpret them.
2. Iterative PC method - We perform factor analysis using “fa” function available in “psych” package which allows the Iterative PC method. Here also we calculate loadings and compute communality and uniqueness with and without varimax rotation. We interpret the results.
3. MLE method - Here we use base R function “factanal” which uses MLE method. We do all the above.
4. Factor scores - We calculate factor scores and check the assumptions.

We begin factor analysis now.

1. PC method

First we perform Factor Analysis using covariance matrix.

Calculating the covariance matrix and its EV-EV pairs.

##              V1           V2           V3           V4           V5
## V1 4.332695e-04 0.0002756679 1.590265e-04 6.411929e-05 8.896616e-05
## V2 2.756679e-04 0.0004387172 1.799737e-04 1.814512e-04 1.232623e-04
## V3 1.590265e-04 0.0001799737 2.239722e-04 7.341348e-05 6.054612e-05
## V4 6.411929e-05 0.0001814512 7.341348e-05 7.224964e-04 5.082772e-04
## V5 8.896616e-05 0.0001232623 6.054612e-05 5.082772e-04 7.656742e-04

## eigen() decomposition
## $values
## [1] 0.0013676780 0.0007011596 0.0002538024 0.0001426026 0.0001188868
## 
## $vectors
##           [,1]       [,2]        [,3]       [,4]        [,5]
## [1,] 0.2228228  0.6252260  0.32611218  0.6627590  0.11765952
## [2,] 0.3072900  0.5703900 -0.24959014 -0.4140935 -0.58860803
## [3,] 0.1548103  0.3445049 -0.03763929 -0.4970499  0.78030428
## [4,] 0.6389680 -0.2479475 -0.64249741  0.3088689  0.14845546
## [5,] 0.6509044 -0.3218478  0.64586064 -0.2163758 -0.09371777

To check number of factors required we plot the eigenvalues and check screeplot.

It looks like 2 factors is good. We check percentages of variance explained by the five principal components in the vector shown below.

## [1]  52.92607  80.05936  89.88095  95.39935 100.00000

So 2 principal components explain 80 percent of the variation.

Next we estimate the loading matrix using the two first two principal components. That is the matrix of eigenvectors scaled by square roots of eigenvalues. This is the 5x2 L matrix.

##             [,1]         [,2]
## [1,] 0.008240463  0.016555621
## [2,] 0.011364238  0.015103596
## [3,] 0.005725215  0.009122290
## [4,] 0.023630398 -0.006565506
## [5,] 0.024071832 -0.008522342

Intepretation

We see the first factor loads on all companies in positive amounts whereas the second factor loads on the first three companies in positive amounts but in negative amounts on the last two companies.

This implies that the first factor could be a general health of economy factor that effects all companies in positive way and the second factor may be an industry specific factor that effects the first three companies negatively and the last two companies positively.

Magnitudes are slightly difficult to interpret because they do not represent correlations as we have used the covariance matrix for factor analysis. But relative magnitudes and signs give some idea

Next we calculate communalities as shown in the vector below.

## [1] 0.0003419938 0.0003572645 0.0001159943 0.0006015016 0.0006520834

and the uniquenesses (the psi matrix) shown below.

##              [,1]         [,2]         [,3]         [,4]         [,5]
## [1,] 9.127563e-05 0.000000e+00 0.0000000000 0.0000000000 0.0000000000
## [2,] 0.000000e+00 8.145269e-05 0.0000000000 0.0000000000 0.0000000000
## [3,] 0.000000e+00 0.000000e+00 0.0001079779 0.0000000000 0.0000000000
## [4,] 0.000000e+00 0.000000e+00 0.0000000000 0.0001209948 0.0000000000
## [5,] 0.000000e+00 0.000000e+00 0.0000000000 0.0000000000 0.0001135907

Again magnitudes are slightly difficult to interpret. It will be better to use the correlation matrix for factor analysis.

Now we repeat the process using Correlation matrix

Calculating the correlation matrix and its EV-EV pairs.

##           V1        V2        V3        V4        V5
## V1 1.0000000 0.6322878 0.5104973 0.1146019 0.1544628
## V2 0.6322878 1.0000000 0.5741424 0.3222921 0.2126747
## V3 0.5104973 0.5741424 1.0000000 0.1824992 0.1462067
## V4 0.1146019 0.3222921 0.1824992 1.0000000 0.6833777
## V5 0.1544628 0.2126747 0.1462067 0.6833777 1.0000000

## eigen() decomposition
## $values
## [1] 2.4372731 1.4070127 0.5005127 0.4000316 0.2551699
## 
## $vectors
##           [,1]       [,2]        [,3]       [,4]        [,5]
## [1,] 0.4690832  0.3680070  0.60431522  0.3630228  0.38412160
## [2,] 0.5324055  0.2364624  0.13610618 -0.6292079 -0.49618794
## [3,] 0.4651633  0.3151795 -0.77182810  0.2889658  0.07116948
## [4,] 0.3873459 -0.5850373 -0.09336192 -0.3812515  0.59466408
## [5,] 0.3606821 -0.6058463  0.10882629  0.4934145 -0.49755167

To check number of factors required we plot the eigenvalues and check screeplot.

It looks like 2 factors is good as only two eigenvalues larger than 1. We check percentages of variance explained by all five principal components in the vector shown below.

## [1]  48.74546  76.88572  86.89597  94.89660 100.00000

So 2 principal components explain 76% of the variance.

Next we estimate the loading matrix using the two first two principal components. That is the matrix of eigenvectors scaled by square roots of eigenvalues. This is the 5x2 L matrix.

##           [,1]       [,2]
## [1,] 0.7323218  0.4365209
## [2,] 0.8311791  0.2804859
## [3,] 0.7262022  0.3738582
## [4,] 0.6047155 -0.6939569
## [5,] 0.5630885 -0.7186401

Intepretation

Here the interpretation remains the same as discussed above but now we can make sense of the magnitudes of loadings because they now represent correlations between the factor and the variables. This is due to the fact that we are now using the correlation matrix for factor analysis.

Next, calculating communalities

## [1] 0.7268458 0.7695311 0.6671396 0.8472571 0.8335122

and uniqenesses (the psi matrix)

##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.2731542 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.0000000 0.2304689 0.0000000 0.0000000 0.0000000
## [3,] 0.0000000 0.0000000 0.3328604 0.0000000 0.0000000
## [4,] 0.0000000 0.0000000 0.0000000 0.1527429 0.0000000
## [5,] 0.0000000 0.0000000 0.0000000 0.0000000 0.1664878

We obtain reasonably high communalities and low uniquenesses suggesting factor model is good.

2. The Iterative PC method

We use the fa function and perform factor analysis using the correlation matrix without rotation first.

## Warning: package 'psych' was built under R version 3.6.3

## maximum iteration exceeded

We look at loadings first.

## 
## Loadings:
##    PA1    PA2   
## V1  0.625 -0.429
## V2  0.777 -0.342
## V3  0.591 -0.332
## V4  0.704  0.709
## V5  0.509  0.455
## 
##                  PA1   PA2
## SS loadings    2.097 1.121
## Proportion Var 0.419 0.224
## Cumulative Var 0.419 0.644

Intepretation

We have similar interpretation as in the basic PC method. The first factor is the general health of the economy factor as it loads moderately posetively on all the companies

The second factor now loads negatively in moderate amounts on the first three companies and positively in moderate amounts on the last two. So the signs have reversed

But separation of company types remains the same.

We also note that two factors explain 64.4% of the variation.

Next we check the uniqueness and communality for each variable.

##          V1          V2          V3          V4          V5 
## 0.424490154 0.280101509 0.540531664 0.002670401 0.534195813

##        V1        V2        V3        V4        V5 
## 0.5755098 0.7198985 0.4594683 0.9973296 0.4658042

Company 4 is well explained by this factor analysis followed by company 2. The others are only moderately well explained by the factor model.

Next we use varimax and check what changes occur.

## maximum iteration exceeded

We look at loadings first.

## 
## Loadings:
##    PA1   PA2  
## V1 0.757      
## V2 0.821 0.216
## V3 0.669 0.108
## V4 0.110 0.993
## V5 0.115 0.673
## 
##                  PA1   PA2
## SS loadings    1.719 1.499
## Proportion Var 0.344 0.300
## Cumulative Var 0.344 0.644

Interpretation

Now there are some changes. We see that all loadings are positive now.

Conclusion here is that again the first factor has positive moderate loading on the first three companies and the second factor has moderate positive loading on the last two companies especialy the fourth company.

So the rotation has removed the overall economy factor and alloted one factor for each company type

The separation of company types is same as before rotation ie the first three are of one type and the last two are of another type.

Next we check uniqueness and communality.

##          V1          V2          V3          V4          V5 
## 0.424490154 0.280101509 0.540531664 0.002670401 0.534195813

##        V1        V2        V3        V4        V5 
## 0.5755098 0.7198985 0.4594683 0.9973296 0.4658042

Similar observation as before rotation. Company 4 is well explained by this factor analysis followed by company 2. The others are only moderately explained by the factor model.

3. The MLE method

We know the number of factors should be 2. We use base R function factanal which uses MLE method. We do factor analysis without rotation and check the loadings.

## 
## Loadings:
##    Factor1 Factor2
## V1  0.121   0.754 
## V2  0.328   0.786 
## V3  0.188   0.650 
## V4  0.997         
## V5  0.685         
## 
##                Factor1 Factor2
## SS loadings      1.622   1.610
## Proportion Var   0.324   0.322
## Cumulative Var   0.324   0.646

Interpretation

We get all positive loadings which is different from the PC and iterative PC method without rotation. But similar to iterative PC method after varimax

However separation of company type is same. First three companies are of one type and the last two companies are of another type.

One thing two note is that the factors have reversed roles. Factor 2 now affects the first three companies and Factor 1 on the last two companies.

64 percent of variation explained by 2 factors.

We check uniqueness (1 - communality)

##        V1        V2        V3        V4        V5 
## 0.4165374 0.2746902 0.5420233 0.0050000 0.5298429

We see that they indicate the same thing as in the case of iterative PC.

Next we perform FA with rotation varimax and check the loadings.

## 
## Loadings:
##    Factor1 Factor2
## V1 0.763          
## V2 0.819   0.232  
## V3 0.668   0.108  
## V4 0.113   0.991  
## V5 0.108   0.677  
## 
##                Factor1 Factor2
## SS loadings      1.725   1.507
## Proportion Var   0.345   0.301
## Cumulative Var   0.345   0.646

Interpretation

Same result (separation of company types) as before varimax. But factors have switched roles again.

64 percent is explained by the two factors as before varimax

Uniqueness (and communality) also remains identical as before varimax as shown below.

##        V1        V2        V3        V4        V5 
## 0.4165374 0.2746902 0.5420233 0.0050000 0.5298429

4. Factor scores

In this section we estimate factor scores by two methods - regression and WLS - after performing factor analyis by MLE method using factanal function.

First we do it by regression method and display some of the scores.

##          Factor1    Factor2
## [1,]  0.16535864 -1.8342740
## [2,]  0.36753184  0.2555092
## [3,] -0.39519052 -0.1079285
## [4,]  0.62520403 -1.2878906
## [5,] -0.06003873  0.9494523
## [6,] -0.37607023  0.3996291

To check assumptions first we check bivariate normality of the factor scores. It passes the three tests - mardia, hz and royston - available in the MVN package via MVN function.

## Warning: package 'MVN' was built under R version 3.6.3

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

## sROC 0.1-2 loaded

##              Test           Statistic           p value Result
## 1 Mardia Skewness   0.745285405187422 0.945629116871027    YES
## 2 Mardia Kurtosis -0.0196531043319442  0.98432010086735    YES
## 3             MVN                <NA>              <NA>    YES

##            Test        HZ   p value MVN
## 1 Henze-Zirkler 0.4749564 0.6592693 YES

##      Test         H   p value MVN
## 1 Royston 0.8014243 0.6698429 YES

Next we calculate the sample mean of scores of both factors and check the mean vector. Both are very nearly zero as seen below.

## [1] 2.967346e-18 2.554559e-17

Next we calculate the sample covariance matrix which is reasonably close to the 2x2 Identity matrix. So they are uncorrelated.

##            Factor1    Factor2
## Factor1 0.81757553 0.02145502
## Factor2 0.02145502 0.99244004

Below we present normal qqplots of both factors individually and they are close to standard normal. They also pass the Shapiro Wilk test individually. Therefore the two factors can be said to be indeed independent as they are uncorrelated and bivariate normal.

## 
##  Shapiro-Wilk normality test
## 
## data:  fa.ml$scores[, 1]
## W = 0.99118, p-value = 0.7423

## 
##  Shapiro-Wilk normality test
## 
## data:  fa.ml$scores[, 2]
## W = 0.98649, p-value = 0.3831

Next we repeat the above on factor scores by WLS method and see some of the scores.

##          Factor1    Factor2
## [1,]  0.25089936 -1.8536707
## [2,]  0.44303382  0.2478778
## [3,] -0.48078772 -0.0983568
## [4,]  0.79921292 -1.3149790
## [5,] -0.09859658  0.9588163
## [6,] -0.47081640  0.4128516

To check assumptions first we check bivariate normality of the factor scores. It passes the three tests - mardia, hz and royston - available in the MVN package via MVN function.

##              Test           Statistic           p value Result
## 1 Mardia Skewness   0.745285405187427 0.945629116871026    YES
## 2 Mardia Kurtosis -0.0196531043319431 0.984320100867351    YES
## 3             MVN                <NA>              <NA>    YES

##            Test        HZ   p value MVN
## 1 Henze-Zirkler 0.4749564 0.6592693 YES

##      Test         H   p value MVN
## 1 Royston 0.9058111 0.6357782 YES

Next we calculate the sample mean of scores of both factors. Both are very nearly zero as seen below.

## [1] 3.938180e-17 2.824978e-19

Next we calculate the sample covariance matrix which is reasonably close to the 2x2 Identity matrix. So they are uncorrelated.

##             Factor1     Factor2
## Factor1  1.22382288 -0.02645716
## Factor2 -0.02645716  1.00818951

## 
##  Shapiro-Wilk normality test
## 
## data:  fa.ml$scores[, 1]
## W = 0.99163, p-value = 0.7779

## 
##  Shapiro-Wilk normality test
## 
## data:  fa.ml$scores[, 2]
## W = 0.98594, p-value = 0.3497

Stock Market Data

by Shirshendu Mukherji

5/22/2020

0. Contents

1. PC method

First we perform Factor Analysis using covariance matrix.

Now we repeat the process using Correlation matrix

2. The Iterative PC method

3. The MLE method

4. Factor scores

The End