1 PCA: Principal Component Analysis Brief Review

PCA is used in exploratory data analysis and make predictive models. It is commonly used for dimensoinaly reduction while minimizing information lost. Principal compentents are a collection of vectors, where vector \(i\) is orthogonal to the first \(i-1\) vectors. In PCA, we do not need to break up independent variable or dependent variable.

1.1 Sample Uniform Data

set.seed(0)
n <-50
data <-data.frame(
  x1=rnorm(n),
   x2=rnorm(n),
   x3=rnorm(n)
)

1.2 PCA Assumption

1.2.1 Bartlett’s Test of Sphericity : Equal Variances

We would like our data comes from multivariate normal distribution with zero covariances. Bartlett’s test checks if the observed correlation matrix R diverges significantly from the identity matrix. H0: The variables are orthogonal. (Correlation matrix does NOT diverge from identity matrix) if \(p-value>0.05\) it means No dimension need to reduce.

The example from “data” shows \(p-value =0.15\), which means the dataset is linearly independent already, and we don’t need to reduce the dimension

r=cor(data)
cortest.bartlett(r, n=nrow(data))

## $chisq
## [1] 5.252329
## 
## $p.value
## [1] 0.1542258
## 
## $df
## [1] 3

1.2.2 Sample Adequacy:

KMO index checks if we can have enough sample size to factorize efficiently of the original variables. Ideally, there should be 150+ data and there should be ratio of at least five rows for each variable \ * General Rule: 20 observations for one variable.

KMO Level: 0.00 to 0.49 unacceptable. 0.50 to 0.59 miserable. 0.60 to 0.69 mediocre. 0.70 to 0.79 middling. 0.80 to 0.89 meritorious. 0.90 to 1.00 marvelous.

The example below shows the Bartlett test \(p-value < 0.05\) means we should apply PCA, which is consistent with the data structure, x4 and x1 , x5 and x2 are highly correlated. The KMO= 0.52 which means we need more data for a better analysis.

set.seed(0)
n <-50
data1 <-data.frame(
  x1=rnorm(n),
   x2=rnorm(n),
   x3=rnorm(n)
)

data1$x4 <-data1$x1 +runif(n,min=-0.5, max=0.5)
data1$x5 <-data1$x2 +runif(n,min=-0.5, max=0.5)

#Test correlation
r=cor(data1)
cortest.bartlett(r, n=nrow(data1))

## $chisq
## [1] 220.7146
## 
## $p.value
## [1] 7.573688e-42
## 
## $df
## [1] 10

t1<- paf(as.matrix(data1))
summary(t1)$KMO

## [1] 0.52928

1.2.3 Postivie Determinant

\[Det(A) >0 \] : Linear Independent

1.3 PCA Procedure

We can see 3- vector is good enough for our predictions

pcmodel <-principal (r, nfactors=ncol(data1),rotate="none")
(var <- pcmodel$Vaccounted[3,])

##     PC1     PC2     PC3     PC4     PC5 
## 0.47783 0.81072 0.97964 0.99027 1.00000

fa.parallel(r,n.obs=ncol(data1),fm="pa", fa="pc")

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  0

plot(var, type="l")

From the loading cofficients, we can see x1 and x4 , x2 and x5 are closely related.

pcmodel$loadings[,1]

##      x1      x2      x3      x4      x5 
## 0.80973 0.66308 0.37330 0.80689 0.70949

pcmodel$loadings[,2]

##       x1       x2       x3       x4       x5 
## -0.50919  0.73101 -0.36921 -0.51877  0.68218

pcmodel$loadings[,3]

##        x1        x2        x3        x4        x5 
## -0.244616  0.024876  0.851002 -0.233529  0.073751

1.4 PCA Survey Example

q <- data.frame (
     var=c("x1", "x2","x3","x4", "x5","x6","x7", "x8","x9","x10", "x11","x12" ),
     desc=c("My job pays me well.", 
"I have my career well planed out.",
"I would do anything to win my boss’ approval.",
"   This is the best job I have ever had.",
" I find my work tedious." ,
"   My job provides me with a sense of achievement.",
"   I perform well in competitive situations.",
"   I think its unfair to promote a person simply because he is more senior.",
"   I am happy with my job.",
"   I hate to be in a responsible position with several people reporting to me.",
"   I am quite content with what I have achieved with my job.",
"   I would leave my job for another offer that pays better ")
     )
q

we can see x1 and x12 are similar questions, next time we can use model to test it!

ref: Mark Newman’s class material

PCA

fangya

8/30/2021