Iniciamos el proceso de PCA convirtiendo nuestras filas en rowindex by names, esto nos permitira analizar mas adelante la tendencia de la PCA
names(USArrests )
[1] "Murder" "Assault" "UrbanPop" "Rape"
Aplicaremos la funcion mean,var
apply(USArrests , 2, var)
Murder Assault UrbanPop Rape
18.97047 6945.16571 209.51878 87.72916
construimos el PCA basado en USArrests
pr.out$scale
Murder Assault UrbanPop Rape
4.355510 83.337661 14.474763 9.366385
la matriz rotation describe los eigen vectores
pr.out$rotation
PC1 PC2 PC3 PC4
Murder -0.5358995 0.4181809 -0.3412327 0.64922780
Assault -0.5831836 0.1879856 -0.2681484 -0.74340748
UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773
Rape -0.5434321 -0.1673186 0.8177779 0.08902432
biplot (pr.out , scale =0)
pr.out$rotation=-pr.out$rotation
pr.out$x=-pr.out$x
biplot (pr.out , scale =0)
lo que podemos observar en la sdev es como cada eigen vector dispersa la data
pr.out$sdev
[1] 1.5748783 0.9948694 0.5971291 0.4164494
pr.var=pr.out$sdev ^2
pr.var
[1] 2.4802416 0.9897652 0.3565632 0.1734301
pve=pr.var/sum(pr.var)
pve
[1] 0.62006039 0.24744129 0.08914080 0.04335752
plot(pve,xlab="Principal Component",ylab="Proportion of Variance Explained", ylim=c(0,1) )
plot(cumsum(pve), xlab="Principal Component", ylab="Cumulative Proportion of Variance Explained", ylim=c(0,1))
la funcion de suma acumulada muestra PVE de los elmentos del vector
plot(pve , xlab=" Principal Component ", ylab="Proportion of Variance Explained ", ylim=c(0,1))
plot(cumsum(pve), xlab="Principal Component ", ylab="Cumulative Proportion of Variance Explained ", ylim=c(0,1))
a=c(1,2,8,-3)
cumsum(a)
[1] 1 3 11 8
summary(pr.out)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.5749 0.9949 0.59713 0.41645
Proportion of Variance 0.6201 0.2474 0.08914 0.04336
Cumulative Proportion 0.6201 0.8675 0.95664 1.00000