Analisis de componentes principales para el archivo “data_pca”

Ahora Normalicemos los datos

data2 <- scale(data_pca[,-16])
View(data2)

Hagamos una prueba de correlacion, para observar si es posible aplicar un analisis de componenetes principales, el resultado de dicha correlacion tiene que ser cercano a cero

det(cor(data2))
## [1] 0.004667778

El resultado obtenido es casi igual a cero por lo que es posible realizar un analisis de componentes principales, ahora analizaremos las cargas de cada componente segun la variable

pca <- princomp(data2)
pca$loadings
## 
## Loadings:
##     Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## x1   0.141  0.320  0.200         0.604  0.137  0.102  0.199         0.109 
## x2  -0.419  0.180        -0.248  0.145         0.236 -0.110        -0.570 
## x3  -0.274  0.177  0.259  0.483 -0.131                      -0.207  0.367 
## x4         -0.307  0.447 -0.223  0.129 -0.167 -0.291        -0.399 -0.152 
## x5  -0.236 -0.410 -0.354  0.102                                    -0.163 
## x6   0.132        -0.203 -0.628         0.178               -0.112  0.340 
## x7   0.112  0.340 -0.464               -0.345  0.151 -0.194  0.136        
## x8   0.385 -0.246         0.153        -0.132 -0.268  0.382  0.424 -0.258 
## x9   0.142  0.246 -0.284               -0.510 -0.336  0.157 -0.536        
## x10 -0.386  0.110        -0.342               -0.331  0.327  0.273  0.383 
## x11         0.137 -0.255  0.122 -0.300  0.528         0.574 -0.364        
## x12  0.102  0.330               -0.226  0.346 -0.643 -0.434        -0.209 
## x13                0.232 -0.223 -0.648 -0.301  0.246                      
## x14  0.175  0.410  0.305 -0.182                0.158  0.154        -0.256 
## x15  0.521 -0.153                       0.135  0.173 -0.256 -0.249  0.151 
##     Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## x1   0.134   0.135           0.588         
## x2   0.203  -0.186                  -0.459 
## x3  -0.232  -0.412           0.113  -0.382 
## x4                   0.576                 
## x5  -0.582   0.204           0.470         
## x6  -0.171  -0.566           0.147         
## x7                   0.662                 
## x8          -0.385                  -0.355 
## x9                  -0.367                 
## x10          0.367                  -0.367 
## x11                  0.237                 
## x12                          0.223         
## x13  0.136                   0.524         
## x14 -0.693                  -0.248         
## x15          0.344                  -0.611 
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.067  0.067  0.067  0.067  0.067  0.067  0.067  0.067  0.067
## Cumulative Var  0.067  0.133  0.200  0.267  0.333  0.400  0.467  0.533  0.600
##                Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## SS loadings      1.000   1.000   1.000   1.000   1.000   1.000
## Proportion Var   0.067   0.067   0.067   0.067   0.067   0.067
## Cumulative Var   0.667   0.733   0.800   0.867   0.933   1.000

Con la siguiente Linea de codigo es posible checar la proporcion de la varianza

summary(pca)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6220588 1.4501268 1.3332930 1.2434264 1.15529908
## Proportion of Variance 0.1762864 0.1408957 0.1191069 0.1035919 0.08942821
## Cumulative Proportion  0.1762864 0.3171821 0.4362890 0.5398809 0.62930907
##                            Comp.6     Comp.7     Comp.8    Comp.9    Comp.10
## Standard deviation     1.05569426 0.90471763 0.88908929 0.8622762 0.80999883
## Proportion of Variance 0.07467272 0.05484181 0.05296347 0.0498171 0.04395967
## Cumulative Proportion  0.70398179 0.75882360 0.81178707 0.8616042 0.90556384
##                          Comp.11    Comp.12    Comp.13   Comp.14     Comp.15
## Standard deviation     0.7012045 0.59518243 0.53958339 0.4662240 0.234552581
## Proportion of Variance 0.0329439 0.02373482 0.01950755 0.0145638 0.003686091
## Cumulative Proportion  0.9385077 0.96224255 0.98175010 0.9963139 1.000000000

con esto es posible observar que la mayor proporcion de varianza se encuentra con valores mas altos es con dos componentes, en el tercero esta ya se ve mas reducida. A continuacion revisaremos la varianza y los eigenvalores

fviz_eig(pca,choice = "variance")

fviz_eig(pca, choice = "eigenvalue")

Tras esta ejecucion es posible observar, que efectivamente el componente uno y el componente dos, aportan mayor varianza

Analisis Grafico

Representaciones Factoriales y su representacion:

fviz_pca_ind(pca,
             col.ind = "cos2",
             gradient.cols=c("red", "yellow", "green"),
             repel = FALSE)

Aqui es posible analizar que el coseno cuadrado indica la proporcion de la varianza de la variable es explicada por el componente principal, aquellos valores cercanos a 1, se encuentran bien representados, y de manera gradual a como va incrementando el color hasta llegar a rojo, menor representacion tienen las variables, por lo que es posible explicar que la mayoria de las variables no se encuentran bien representadas por el componente principal, pero esto depende del grupo de datos y de su estudio.

Grafico de Cargas Factoriales

fviz_pca_var(pca,
             col.var = "contrib",
             gradient.cols = c('red', 'yellow', 'green'),
             repel = FALSE)

Es posible observar que las variables x10,x2,x3, x8,x15,y x6 representan un a dimension y las variables x5, x4, x11, x12, x7, x13,x9,x1,x14 representan una dimension totalmente diferente.

Usaremos un biplot para visualizar las puntuaciones

fviz_pca_biplot(pca,
                col.var = "red",
                col.ind = "black")

En este analisis bidimensional es posible observar que sujetos como el 8 en el cuadrante 4, tienden a darle mas importancia a variables como x15, y menor importancia a las variables como x2, observemos a continuacion

data2[8,]
##         x1         x2         x3         x4         x5         x6         x7 
## -0.3361816 -2.2846106 -0.9850069 -0.3205148 -0.3698024  1.1582924  0.7202647 
##         x8         x9        x10        x11        x12        x13        x14 
##  0.9307831  1.6138598 -1.5768108 -0.8876292 -0.9577966  0.2812746 -0.3031942 
##        x15 
##  2.3942047

En el cuadrante 2, sujetos como el 90 le dan mas importancia a variables como x13, y menor importancia a variables como x4

data2[90,]
##          x1          x2          x3          x4          x5          x6 
##  0.77083066 -0.30511876 -0.17814631 -1.08357443 -0.92382349 -0.05815972 
##          x7          x8          x9         x10         x11         x12 
##  1.96482907 -0.08019998 -0.17110473  0.02277015 -0.32292431  0.78273451 
##         x13         x14         x15 
##  2.06467889  1.22680995  0.85446103

Analisis de Proporcion Spss

psych::cor.plot(data2)

Es posible observar que la correlacion entre las variables es muy baja, ya que la mayoria son negativas entre ellas, aquellas variabes con mas correlacion entre si son x7, con x9, las variables x1 con x14, ademas de x15 con x8 por mencionar algunas.

A continuacio aplicaremos rotacion varimax para asi ajustar las cargas de modo que cada variable tenga una carga alta en un solo componente haciendo que la estructura sea mas simple y clara

pca2 <- psych::principal(data2, nfactors = 2, residuals =FALSE, rotate ="varimax",
                         scores = TRUE, oblique.scores = FALSE, method = "regresion",
                         use = "pairwise", cor = "cor", weight = NULL)
pca2
## Principal Components Analysis
## Call: psych::principal(r = data2, nfactors = 2, residuals = FALSE, 
##     rotate = "varimax", scores = TRUE, oblique.scores = FALSE, 
##     method = "regresion", use = "pairwise", cor = "cor", weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
##       RC1   RC2    h2   u2 com
## x1   0.04  0.52 0.269 0.73 1.0
## x2  -0.73  0.00 0.533 0.47 1.0
## x3  -0.51  0.08 0.265 0.73 1.0
## x4   0.09 -0.45 0.206 0.79 1.1
## x5  -0.14 -0.70 0.503 0.50 1.1
## x6   0.22  0.03 0.048 0.95 1.0
## x7  -0.01  0.53 0.278 0.72 1.0
## x8   0.71 -0.11 0.519 0.48 1.0
## x9   0.09  0.42 0.181 0.82 1.1
## x10 -0.64 -0.08 0.419 0.58 1.0
## x11 -0.02  0.20 0.042 0.96 1.0
## x12 -0.02  0.51 0.258 0.74 1.0
## x13  0.12  0.12 0.030 0.97 2.0
## x14  0.05  0.66 0.436 0.56 1.0
## x15  0.87  0.10 0.768 0.23 1.0
## 
##                        RC1  RC2
## SS loadings           2.57 2.18
## Proportion Var        0.17 0.15
## Cumulative Var        0.17 0.32
## Proportion Explained  0.54 0.46
## Cumulative Proportion 0.54 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.14 
##  with the empirical chi square  770.91  with prob <  1.5e-115 
## 
## Fit based upon off diagonal values = 0.51

Con ello es posible observar que con dos componentes se explica la maxima varianza en el modelo, podrian usarse hasta 4 componentes principales, pero es mas factible y optima la representacion de unicamente dos componentes, por lo que solo se utilizaran dos componenetes. A continuacion una matriz de coeficientes sobre los dos componentes

pca2$weights[,1]
##            x1            x2            x3            x4            x5 
##  0.0008960604 -0.2851378248 -0.2012827708  0.0481355724 -0.0332723838 
##            x6            x7            x8            x9           x10 
##  0.0837472648 -0.0202342745  0.2817042861  0.0202814144 -0.2486671202 
##           x11           x12           x13           x14           x15 
## -0.0151416330 -0.0236242067  0.0447827540 -0.0013696313  0.3369550463
pca2$weights[,2]
##           x1           x2           x3           x4           x5           x6 
##  0.236717095  0.021936576  0.052252404 -0.208155734 -0.315680597  0.008795705 
##           x7           x8           x9          x10          x11          x12 
##  0.243092448 -0.071914925  0.189412732 -0.015124597  0.095096723  0.234550137 
##          x13          x14          x15 
##  0.051813948  0.301661558  0.017665734

Asi bien podemos obtener nuevas variables cuya principal caracteristica es que son ortogonales, este conjunto de variables se redujo unicamente a dos variables, explicadas de la siguiente manera

pca2$scores
##                 RC1         RC2
##   [1,] -1.663181432 -3.47805844
##   [2,] -0.363562031 -0.25080118
##   [3,]  0.355426863 -0.37948303
##   [4,]  0.131447901  0.18641872
##   [5,] -1.728595344  0.07626033
##   [6,]  0.886144950 -0.74574263
##   [7,] -1.791375398 -1.52348776
##   [8,]  2.471554555  0.10651225
##   [9,] -0.148288881  0.89193562
##  [10,]  1.108288252 -1.05150773
##  [11,]  0.917048312  0.64596530
##  [12,]  2.136851958  2.01301453
##  [13,]  0.194033287 -0.81137340
##  [14,] -0.869718074  1.23715076
##  [15,]  0.044982518  0.47987229
##  [16,]  0.084967686  0.01060598
##  [17,]  0.793574599  1.89480493
##  [18,] -0.119102206 -1.67230798
##  [19,] -0.080510505 -0.13905438
##  [20,] -0.151593533 -0.34167904
##  [21,]  1.509186645 -2.50404963
##  [22,]  0.025517491  0.91758816
##  [23,]  0.168060496 -0.09080835
##  [24,]  0.040314586 -0.06384400
##  [25,]  0.105099516  0.32604240
##  [26,] -1.141803540 -0.53291409
##  [27,] -0.327694558  0.57688400
##  [28,] -1.745004932 -2.11925524
##  [29,] -1.886388920  1.36596967
##  [30,] -1.408426430  0.53267094
##  [31,]  2.255198855 -0.01305522
##  [32,] -0.953839808  0.61028878
##  [33,]  0.462452562 -0.81267787
##  [34,] -2.186063854  0.95664069
##  [35,]  1.335112083  1.44962001
##  [36,]  0.263479483 -1.17372941
##  [37,] -1.630151831 -0.26740084
##  [38,]  0.187192927  1.77358413
##  [39,]  0.259357062  0.24221556
##  [40,] -0.264116689  1.50745427
##  [41,] -0.649503681 -0.32114967
##  [42,]  1.195849679  0.52694049
##  [43,] -1.364604928  0.60535110
##  [44,] -0.826674665  0.27129448
##  [45,]  0.632862315  0.13811486
##  [46,] -1.279442465 -0.34897872
##  [47,]  0.521722448  1.43964201
##  [48,]  2.057561982 -1.56602701
##  [49,] -0.694814644 -1.64132533
##  [50,]  0.198683933 -0.66074941
##  [51,] -0.096684160  0.08420017
##  [52,] -0.080792492 -0.76697582
##  [53,]  0.523728599  0.79783410
##  [54,] -0.881160378 -0.09244674
##  [55,]  0.889791101  1.67006207
##  [56,]  1.516734928  0.47128242
##  [57,]  0.205675362  1.81559398
##  [58,] -1.211172109  1.48363719
##  [59,]  0.285751751  0.37764667
##  [60,] -1.882018623  2.45253675
##  [61,]  0.001242478  0.49272459
##  [62,]  0.229126806 -0.74076759
##  [63,] -0.990237939 -1.88936199
##  [64,]  0.712992703  0.27760981
##  [65,]  0.403242858  0.08942812
##  [66,]  0.008525364  0.80490821
##  [67,] -0.500863674 -1.03679961
##  [68,] -1.688732565  0.62609716
##  [69,] -0.879917608 -0.18706950
##  [70,]  0.249449544  1.91471624
##  [71,] -0.283464801 -0.03073237
##  [72,] -0.091621925  0.44957214
##  [73,]  0.861822200 -0.04294464
##  [74,]  0.078035665  0.04721447
##  [75,] -2.292492422  0.38474563
##  [76,]  1.728119167 -0.42831342
##  [77,]  0.094645513 -0.07866099
##  [78,]  1.021409038  1.20287383
##  [79,]  0.264695360 -0.75680623
##  [80,]  0.112260616 -0.07658599
##  [81,]  1.084519001 -0.09232373
##  [82,] -0.532661091  0.25672190
##  [83,]  1.791813516  1.26091207
##  [84,]  0.473204015  0.81810229
##  [85,] -0.684905270  1.32510017
##  [86,]  0.940595870 -1.30853778
##  [87,] -1.244494813  0.69632648
##  [88,]  0.889744074  0.12173111
##  [89,] -1.680528953  0.34984533
##  [90,]  0.390870811  1.77882636
##  [91,] -1.149286816  0.20564591
##  [92,] -0.883995431 -0.05331987
##  [93,] -0.056006952 -0.05773772
##  [94,]  0.622921653  0.20797725
##  [95,]  1.837861719 -0.71191950
##  [96,]  1.146132246 -0.98789143
##  [97,]  0.831304077 -0.65399469
##  [98,] -0.167267711 -0.09652808
##  [99,]  0.157844745  0.57471758
## [100,]  1.289681991  0.33736814
## [101,] -0.009518897 -0.66442164
## [102,] -1.131694227  0.10420478
## [103,]  0.227820260 -1.77320271
## [104,]  0.343636435  0.28669628
## [105,] -1.015254442  0.49414549
## [106,]  1.227695664 -1.13583579
## [107,] -0.653524974  0.42617784
## [108,]  0.935352727 -2.19226831
## [109,] -0.945359579  0.75278169
## [110,]  0.798493248  0.59527668
## [111,] -0.182434415 -0.18049814
## [112,] -0.108828086  1.36744887
## [113,]  0.068642652 -1.90638751
## [114,] -0.505808483  0.17625129
## [115,] -1.313357721 -1.42694362
## [116,] -1.379364286 -2.12241890
## [117,]  0.555141327  1.98190981
## [118,] -0.667412652  0.91860021
## [119,]  1.337478042 -0.39756339
## [120,] -1.656553170  0.25584312
## [121,] -1.977257823  0.41552427
## [122,]  1.343502445  0.14766768
## [123,] -0.540823678 -0.96358806
## [124,]  0.625136393 -0.57698845
## [125,]  0.718927880  1.62673405
## [126,] -0.195155692  0.89523542
## [127,]  0.910692513 -1.41201180
## [128,] -0.540040515 -2.07462041
## [129,]  0.600264251 -0.68426890
## [130,]  0.177979990 -0.82184466
## [131,] -0.839385964 -0.75379213
## [132,] -0.588186259  0.63876238
## [133,] -1.024527343  0.16243794
## [134,] -1.389595773  1.45835963
## [135,]  0.667635654 -0.39040295
## [136,] -0.056720053 -0.55261685
## [137,] -1.362855606 -0.78420294
## [138,]  1.337345287  0.80777575
## [139,] -0.366397759 -0.51005677
## [140,]  0.729524091 -1.13077035
## [141,]  0.681572460 -0.22331639
## [142,] -0.278730217  1.31442187
## [143,]  1.471062711 -0.47770100
## [144,] -0.132713631  0.34929955
## [145,] -0.496178664 -0.33851983
## [146,]  0.600822286 -0.78493424
## [147,]  0.272593112  0.06594810
## [148,] -1.931720581 -1.85024125
## [149,] -0.350702532  0.85353899
## [150,] -0.374593439  2.01477013
## [151,]  0.068601982 -0.60043400
## [152,]  0.645537289 -1.94493688
## [153,] -0.626378015 -0.21433197
## [154,]  1.481035203 -0.99665310
## [155,]  2.207929960  0.12640557
## [156,]  0.381041005  0.09880233
## [157,]  0.390060709 -0.33821528
## [158,]  1.363127868  1.53326647
## [159,] -0.178529026 -0.63247144
## [160,] -0.430415472  1.27189265
## [161,] -1.830687715 -0.07070883
## [162,] -0.844544670 -1.39959915
## [163,] -0.365501252  0.09688615
## [164,]  0.827537887 -0.31323932
## [165,]  0.094443091  0.66355454
## [166,]  1.734208120  0.18287931
## [167,]  1.101412087  0.10001821
## [168,]  0.139050937  0.97362543
## [169,] -1.270278461  0.58877299
## [170,]  0.989258008 -0.84731891
## [171,] -1.185035259  0.98452974
## [172,] -0.966644026 -0.08551493
## [173,]  0.055322471 -1.15963929
## [174,]  0.185883835 -1.01999761
## [175,]  1.376528781 -0.05288940
## [176,]  0.879527355 -0.30554160
## [177,] -0.025087450  0.86804996
## [178,]  1.501287798 -0.41856491
## [179,]  0.446654247  0.48227162
## [180,] -2.502882395 -0.34160096
## [181,] -0.719805613  1.22757385
## [182,]  0.513419818  0.34717255
## [183,]  1.488350553 -0.65111277
## [184,]  0.688111149  0.86961481
## [185,] -0.248192785 -0.04273693
## [186,] -0.774447598 -0.33076244
## [187,]  0.213469367  0.14410373
## [188,] -0.451083343 -0.50725241
## [189,]  0.759350447  2.18536730
## [190,]  0.832397164  0.14776045
## [191,]  0.119632381  0.53035188
## [192,] -0.225847994 -0.95650810
## [193,] -0.180965925 -1.27041253
## [194,]  0.374494531 -1.40665842
## [195,]  0.238388497  0.69270164
## [196,] -1.276731027  1.06037324
## [197,]  0.236446028 -0.88610798
## [198,]  0.742677578 -0.91861134
## [199,] -2.084771954 -0.63576583
## [200,]  0.063069234 -0.36840522

Base de Datos “Poblacion USA (2020)”

library(readxl)
## Warning: package 'readxl' was built under R version 4.4.3
PoblacionUSA <- read_excel("C:/Users/Ines Kimberly/Downloads/PoblacionUSA.xlsm")
View(PoblacionUSA)

A continuacion normalizaremos unicamente los datos correspondientes al año 2020 y 2021

Hagamos una prueba de correlacion, para observar si es posible aplicar un analisis de componenetes principales, el resultado de dicha correlacion tiene que ser cercano a cero

det(cor(data2020))
## [1] -8.80948e-41
det(cor(data2021))
## [1] -4.747386e-25

El resultado es practicamente cero para amboa años, por lo que es factible utilizar el pca Ahora realizaremos un analisis del factor de adecuacion Muestral de Kayser-Mayer

psych::KMO(data2020)
## Error in solve.default(r) : 
##   sistema es computacionalmente singular: número de condición recíproco = 3.40445e-18
## matrix is not invertible, image not found
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = data2020)
## Overall MSA =  0.5
## MSA for each item = 
##    Census Resident Total Population - AB:Qr-1-2020 
##                                                0.5 
##    Resident Total Population Estimate - Jul-1-2020 
##                                                0.5 
##                Net Domestic Migration - Jul-1-2020 
##                                                0.5 
## Federal/Civilian Movement from Abroad - Jul-1-2020 
##                                                0.5 
##           Net International Migration - Jul-1-2020 
##                                                0.5 
##                         Period Births - Jul-1-2020 
##                                                0.5 
##                         Period Deaths - Jul-1-2020 
##                                                0.5 
## Resident Under 65 Population Estimate - Jul-1-2020 
##                                                0.5 
##  Resident 65 Plus Population Estimate - Jul-1-2020 
##                                                0.5 
##                              Residual - Jul-1-2020 
##                                                0.5
psych::KMO(data2021)
## Error in solve.default(r) : 
##   sistema es computacionalmente singular: número de condición recíproco = 4.27828e-18
## matrix is not invertible, image not found
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = data2021)
## Overall MSA =  0.5
## MSA for each item = 
##    Resident Total Population Estimate - Jul-1-2021 
##                                                0.5 
##                Net Domestic Migration - Jul-1-2021 
##                                                0.5 
## Federal/Civilian Movement from Abroad - Jul-1-2021 
##                                                0.5 
##           Net International Migration - Jul-1-2021 
##                                                0.5 
##                         Period Births - Jul-1-2021 
##                                                0.5 
##                         Period Deaths - Jul-1-2021 
##                                                0.5 
## Resident Under 65 Population Estimate - Jul-1-2021 
##                                                0.5 
##  Resident 65 Plus Population Estimate - Jul-1-2021 
##                                                0.5 
##                              Residual - Jul-1-2021 
##                                                0.5

Para ambos años presenta un valor igual a 0.5, lo que nos indica un valor regular de KMO Ahora analizaremos las cargas de cada componente segun la variable

pca2020 <- princomp(data2020)
pca2020$loadings
## 
## Loadings:
##                                                    Comp.1 Comp.2 Comp.3 Comp.4
## Census Resident Total Population - AB:Qr-1-2020     0.367                     
## Resident Total Population Estimate - Jul-1-2020     0.367                     
## Net Domestic Migration - Jul-1-2020                       -0.660  0.669  0.256
## Federal/Civilian Movement from Abroad - Jul-1-2020  0.278 -0.343  0.124 -0.799
## Net International Migration - Jul-1-2020            0.347        -0.102 -0.209
## Period Births - Jul-1-2020                          0.365                     
## Period Deaths - Jul-1-2020                          0.355         0.122  0.292
## Resident Under 65 Population Estimate - Jul-1-2020  0.367                     
## Resident 65 Plus Population Estimate - Jul-1-2020   0.354         0.149  0.288
## Residual - Jul-1-2020                                     -0.654 -0.698  0.255
##                                                    Comp.5 Comp.6 Comp.7 Comp.8
## Census Resident Total Population - AB:Qr-1-2020            0.191  0.349       
## Resident Total Population Estimate - Jul-1-2020            0.193  0.346       
## Net Domestic Migration - Jul-1-2020                -0.176  0.107              
## Federal/Civilian Movement from Abroad - Jul-1-2020  0.323 -0.214              
## Net International Migration - Jul-1-2020           -0.878 -0.143         0.170
## Period Births - Jul-1-2020                                 0.547 -0.666 -0.347
## Period Deaths - Jul-1-2020                          0.266 -0.323 -0.397  0.658
## Resident Under 65 Population Estimate - Jul-1-2020         0.299  0.386  0.118
## Resident 65 Plus Population Estimate - Jul-1-2020         -0.590        -0.633
## Residual - Jul-1-2020                                                         
##                                                    Comp.9 Comp.10
## Census Resident Total Population - AB:Qr-1-2020     0.807  0.203 
## Resident Total Population Estimate - Jul-1-2020    -0.540  0.634 
## Net Domestic Migration - Jul-1-2020                              
## Federal/Civilian Movement from Abroad - Jul-1-2020               
## Net International Migration - Jul-1-2020                         
## Period Births - Jul-1-2020                                       
## Period Deaths - Jul-1-2020                                       
## Resident Under 65 Population Estimate - Jul-1-2020 -0.237 -0.740 
## Resident 65 Plus Population Estimate - Jul-1-2020         -0.101 
## Residual - Jul-1-2020                                            
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings       1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
## Proportion Var    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1    0.1
## Cumulative Var    0.1    0.2    0.3    0.4    0.5    0.6    0.7    0.8    0.9
##                Comp.10
## SS loadings        1.0
## Proportion Var     0.1
## Cumulative Var     1.0
pca2021 <- princomp(data2021)
pca2021$loadings
## 
## Loadings:
##                                                    Comp.1 Comp.2 Comp.3 Comp.4
## Resident Total Population Estimate - Jul-1-2021     0.392                     
## Net Domestic Migration - Jul-1-2021                       -0.690  0.585  0.260
## Federal/Civilian Movement from Abroad - Jul-1-2021 -0.299  0.292  0.528 -0.698
## Net International Migration - Jul-1-2021            0.373        -0.239 -0.242
## Period Births - Jul-1-2021                          0.391                     
## Period Deaths - Jul-1-2021                          0.379         0.328       
## Resident Under 65 Population Estimate - Jul-1-2021  0.392                     
## Resident 65 Plus Population Estimate - Jul-1-2021   0.380         0.335       
## Residual - Jul-1-2021                               0.150 -0.639 -0.296 -0.617
##                                                    Comp.5 Comp.6 Comp.7 Comp.8
## Resident Total Population Estimate - Jul-1-2021            0.217  0.475       
## Net Domestic Migration - Jul-1-2021                -0.324                     
## Federal/Civilian Movement from Abroad - Jul-1-2021 -0.148  0.192              
## Net International Migration - Jul-1-2021           -0.783 -0.299 -0.108  0.174
## Period Births - Jul-1-2021                                 0.627 -0.595 -0.309
## Period Deaths - Jul-1-2021                          0.367 -0.225 -0.350  0.657
## Resident Under 65 Population Estimate - Jul-1-2021         0.317  0.534  0.123
## Resident 65 Plus Population Estimate - Jul-1-2021   0.156 -0.526        -0.651
## Residual - Jul-1-2021                               0.315                     
##                                                    Comp.9
## Resident Total Population Estimate - Jul-1-2021     0.746
## Net Domestic Migration - Jul-1-2021                      
## Federal/Civilian Movement from Abroad - Jul-1-2021       
## Net International Migration - Jul-1-2021                 
## Period Births - Jul-1-2021                               
## Period Deaths - Jul-1-2021                               
## Resident Under 65 Population Estimate - Jul-1-2021 -0.660
## Resident 65 Plus Population Estimate - Jul-1-2021        
## Residual - Jul-1-2021                                    
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.111  0.111  0.111  0.111  0.111  0.111  0.111  0.111  0.111
## Cumulative Var  0.111  0.222  0.333  0.444  0.556  0.667  0.778  0.889  1.000

Con la siguiente Linea de codigo es posible checar la proporcion de la varianza

summary(pca2020)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     2.6907367 1.1607536 0.8280683 0.6125685 0.33421800
## Proportion of Variance 0.7384865 0.1374296 0.0699411 0.0382745 0.01139357
## Cumulative Proportion  0.7384865 0.8759161 0.9458572 0.9841317 0.99552529
##                             Comp.6       Comp.7       Comp.8 Comp.9 Comp.10
## Standard deviation     0.204077430 0.0391252568 2.629360e-02      0       0
## Proportion of Variance 0.004248055 0.0001561401 7.051803e-05      0       0
## Cumulative Proportion  0.999773342 0.9999294820 1.000000e+00      1       1

Para 2020 es posible observar que con un solo componente es mas que suficciente ya que ahi se encuentra explicado un 74% de la varianza,se tendria que seguir analizando para ver si es recomendable usar otro componente mas.

summary(pca2021)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.5056381 1.2841609 0.66974585 0.55374490 0.31095905
## Proportion of Variance 0.7115319 0.1868945 0.05083674 0.03475179 0.01095883
## Cumulative Proportion  0.7115319 0.8984264 0.94926311 0.98401490 0.99497372
##                             Comp.6       Comp.7       Comp.8 Comp.9
## Standard deviation     0.205937451 0.0345666035 2.728378e-02      0
## Proportion of Variance 0.004806493 0.0001354163 8.436586e-05      0
## Cumulative Proportion  0.999780218 0.9999156341 1.000000e+00      1

De manera similar para 2021, tambien parece ser suficiente con un solo componente y que con ese se explica aproximadamenteel 70% de la varianza, se tendria que seguir analizando para ver si es recomendable usar otro componente mas. A continuacion revisaremos la varianza y los eigenvalores para 2020

fviz_eig(pca2020,choice = "variance")

fviz_eig(pca2020, choice = "eigenvalue")

Con este grafico es posible observar que la mayoria de la varianza se eencuentra explicada por un solo componente principal Ahora para 2021

fviz_eig(pca2021,choice = "variance")

fviz_eig(pca2021, choice = "eigenvalue")

Para este año tambien es osible observarq ue la mayoria de la varianza se explica de manera sufieciente con un solo componente.

Analisis Grafico

Representaciones Factoriales y su representacion: 2020

fviz_pca_ind(pca2020,
             col.ind = "cos2",
             gradient.cols=c("red", "yellow", "green"),
             repel = FALSE)

Aqui es posible analizar que el coseno cuadrado indica la proporcion de la varianza de la variable es explicada por el componente principal, aquellos valores cercanos a 1, se encuentran bien representados, y de manera gradual a como va incrementando el color hasta llegar a rojo, menor representacion tienen las variables, por lo que es posible explicar que la mayoria de las variables se encuentran bien representadas por el componente principal, a excepcion de el numero 2, el 48, y el 26, pero son minoria, asi que no afectan a nuestro estudio.

2021

fviz_pca_ind(pca2021,
             col.ind = "cos2",
             gradient.cols=c("red", "yellow", "green"),
             repel = FALSE)

Para el año 2021, es posible observar que el unico dato qu no se explica correctamente es el dato 2, per solo es un dato en todo el onjunto de datos, por lo que carece de importancia. Grafico de Cargas Factoriales 2020

fviz_pca_var(pca2020,
             col.var = "contrib",
             gradient.cols = c('red', 'yellow', 'green'),
             repel = FALSE)

Es posible observar que las variables se pueden agrupar en dos dimensiones, donde una esta conformada por Migracion, Residual y Federal para el mes de julio y otra dimension para los demas.

2021

fviz_pca_var(pca2021,
             col.var = "contrib",
             gradient.cols = c('red', 'yellow', 'green'),
             repel = FALSE)

Para el año 2021 se divide en tres dimensiones, donde una esta conformada por Residual y Net domestic migration, otra dimension esta conformada por net from abroad, y la otra por las demas.

Usaremos un biplot para visualizar las puntuaciones 2020

fviz_pca_biplot(pca2020,
                col.var = "red",
                col.ind = "black")

Es posible observar que sujetos como el 11 tienden a tener mas preferencia por Residual en comparacion con otras variables, o el sujeto 10 se eencuentra mas inclinado por movement from abroad

data2020[11,]
##    Census Resident Total Population - AB:Qr-1-2020 
##                                         0.43289508 
##    Resident Total Population Estimate - Jul-1-2020 
##                                         0.43626956 
##                Net Domestic Migration - Jul-1-2020 
##                                         1.81333937 
## Federal/Civilian Movement from Abroad - Jul-1-2020 
##                                         1.13149799 
##           Net International Migration - Jul-1-2020 
##                                         0.02083766 
##                         Period Births - Jul-1-2020 
##                                         0.58378211 
##                         Period Deaths - Jul-1-2020 
##                                         0.33145645 
## Resident Under 65 Population Estimate - Jul-1-2020 
##                                         0.47531658 
##  Resident 65 Plus Population Estimate - Jul-1-2020 
##                                         0.13600669 
##                              Residual - Jul-1-2020 
##                                         2.78992635

2021

fviz_pca_biplot(pca2021,
                col.var = "red",
                col.ind = "black")

Es posible observar que sujetos como el 6 tienden a tener mas preferencia por residual en comparacion con otras variables, o el sujeto 48 se encuentra mas inclinado por net domestic migration

data2021[6,]
##    Resident Total Population Estimate - Jul-1-2021 
##                                         -0.1862544 
##                Net Domestic Migration - Jul-1-2021 
##                                          0.8179269 
## Federal/Civilian Movement from Abroad - Jul-1-2021 
##                                         -0.1681316 
##           Net International Migration - Jul-1-2021 
##                                         -0.1141287 
##                         Period Births - Jul-1-2021 
##                                         -0.1599211 
##                         Period Deaths - Jul-1-2021 
##                                         -0.4113328 
## Resident Under 65 Population Estimate - Jul-1-2021 
##                                         -0.1627187 
##  Resident 65 Plus Population Estimate - Jul-1-2021 
##                                         -0.3540924 
##                              Residual - Jul-1-2021 
##                                          1.0247546
data2021[48,]
##    Resident Total Population Estimate - Jul-1-2021 
##                                        0.064459658 
##                Net Domestic Migration - Jul-1-2021 
##                                        0.437039714 
## Federal/Civilian Movement from Abroad - Jul-1-2021 
##                                       -0.772697762 
##           Net International Migration - Jul-1-2021 
##                                        0.026219017 
##                         Period Births - Jul-1-2021 
##                                        0.009235824 
##                         Period Deaths - Jul-1-2021 
##                                       -0.062897824 
## Resident Under 65 Population Estimate - Jul-1-2021 
##                                        0.076784090 
##  Resident 65 Plus Population Estimate - Jul-1-2021 
##                                       -0.028443946 
##                              Residual - Jul-1-2021 
##                                        0.381199726

Analisis de Proporcion Spss

psych::cor.plot(data2020)

Es posible observar que la mayoria de las correlacones entre los componentes es fuerte ya que son muy cercanos a 1

2021

psych::cor.plot(data2021)

Es posible observar que la mayoria de las correlacones entre los componentes es fuerte ya que son muy cercanos a 1

A continuacion aplicaremos rotacion varimax para asi ajustar las cargas de modo que cada variable tenga una carga alta en un solo componente haciendo que la estructura sea mas simple y clara 2020

pca20202 <- psych::principal(data2020, nfactors = 1, residuals =FALSE, rotate ="varimax",
                         scores = TRUE, oblique.scores = FALSE, method = "regresion",
                         use = "pairwise", cor = "cor", weight = NULL)
## Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done
## In factor.stats, I could not find the RMSEA upper bound . Sorry about that
## Warning in psych::principal(data2020, nfactors = 1, residuals = FALSE, rotate =
## "varimax", : The matrix is not positive semi-definite, scores found from
## Structure loadings
pca20202
## Principal Components Analysis
## Call: psych::principal(r = data2020, nfactors = 1, residuals = FALSE, 
##     rotate = "varimax", scores = TRUE, oblique.scores = FALSE, 
##     method = "regresion", use = "pairwise", cor = "cor", weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
##                                                      PC1    h2     u2 com
## Census Resident Total Population - AB:Qr-1-2020     1.00 0.992 0.0080   1
## Resident Total Population Estimate - Jul-1-2020     1.00 0.992 0.0078   1
## Net Domestic Migration - Jul-1-2020                -0.24 0.059 0.9408   1
## Federal/Civilian Movement from Abroad - Jul-1-2020  0.75 0.569 0.4312   1
## Net International Migration - Jul-1-2020            0.94 0.887 0.1130   1
## Period Births - Jul-1-2020                          0.99 0.985 0.0147   1
## Period Deaths - Jul-1-2020                          0.97 0.933 0.0668   1
## Resident Under 65 Population Estimate - Jul-1-2020  1.00 0.993 0.0074   1
## Resident 65 Plus Population Estimate - Jul-1-2020   0.96 0.928 0.0721   1
## Residual - Jul-1-2020                               0.22 0.047 0.9532   1
## 
##                 PC1
## SS loadings    7.38
## Proportion Var 0.74
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.08 
##  with the empirical chi square  27.37  with prob <  0.82 
## 
## Fit based upon off diagonal values = 0.99

2021

pca20212 <- psych::principal(data2021, nfactors = 1, residuals =FALSE, rotate ="varimax",
                         scores = TRUE, oblique.scores = FALSE, method = "regresion",
                         use = "pairwise", cor = "cor", weight = NULL)
## Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done
## Warning in psych::principal(data2021, nfactors = 1, residuals = FALSE, rotate =
## "varimax", : The matrix is not positive semi-definite, scores found from
## Structure loadings
pca20212
## Principal Components Analysis
## Call: psych::principal(r = data2021, nfactors = 1, residuals = FALSE, 
##     rotate = "varimax", scores = TRUE, oblique.scores = FALSE, 
##     method = "regresion", use = "pairwise", cor = "cor", weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
##                                                      PC1   h2    u2 com
## Resident Total Population Estimate - Jul-1-2021     0.99 0.98 0.016   1
## Net Domestic Migration - Jul-1-2021                -0.10 0.01 0.990   1
## Federal/Civilian Movement from Abroad - Jul-1-2021 -0.76 0.57 0.427   1
## Net International Migration - Jul-1-2021            0.94 0.89 0.111   1
## Period Births - Jul-1-2021                          0.99 0.98 0.022   1
## Period Deaths - Jul-1-2021                          0.96 0.92 0.081   1
## Resident Under 65 Population Estimate - Jul-1-2021  0.99 0.98 0.016   1
## Resident 65 Plus Population Estimate - Jul-1-2021   0.96 0.92 0.076   1
## Residual - Jul-1-2021                               0.38 0.14 0.857   1
## 
##                 PC1
## SS loadings    6.40
## Proportion Var 0.71
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.13 
##  with the empirical chi square  62.55  with prob <  0.00012 
## 
## Fit based upon off diagonal values = 0.97

Con ello es posible observar que para ambos años con un solo componente se explica la maxima varianza en el modelo, podrian usarse mas componentes principales, pero es mas factible y optima la representacion de unicamente un componente, por lo que solo se utilizara un componenete para cada año. A continuacion una matriz de coeficientes sobre el componente 2020

pca20202$weights[,1]
##    Census Resident Total Population - AB:Qr-1-2020 
##                                          0.9960094 
##    Resident Total Population Estimate - Jul-1-2020 
##                                          0.9961092 
##                Net Domestic Migration - Jul-1-2020 
##                                         -0.2433636 
## Federal/Civilian Movement from Abroad - Jul-1-2020 
##                                          0.7541575 
##           Net International Migration - Jul-1-2020 
##                                          0.9417826 
##                         Period Births - Jul-1-2020 
##                                          0.9925980 
##                         Period Deaths - Jul-1-2020 
##                                          0.9659996 
## Resident Under 65 Population Estimate - Jul-1-2020 
##                                          0.9962822 
##  Resident 65 Plus Population Estimate - Jul-1-2020 
##                                          0.9632815 
##                              Residual - Jul-1-2020 
##                                          0.2162589

2021

pca20212$weights[,1]
##    Resident Total Population Estimate - Jul-1-2021 
##                                          0.9917747 
##                Net Domestic Migration - Jul-1-2021 
##                                         -0.1016653 
## Federal/Civilian Movement from Abroad - Jul-1-2021 
##                                         -0.7569791 
##           Net International Migration - Jul-1-2021 
##                                          0.9430850 
##                         Period Births - Jul-1-2021 
##                                          0.9888870 
##                         Period Deaths - Jul-1-2021 
##                                          0.9584075 
## Resident Under 65 Population Estimate - Jul-1-2021 
##                                          0.9917319 
##  Resident 65 Plus Population Estimate - Jul-1-2021 
##                                          0.9613842 
##                              Residual - Jul-1-2021 
##                                          0.3783821

Asi bien podemos obtener nuevas variables para ambos años cuya principal caracteristica es que son ortogonales, este conjunto de variables se redujo unicamente a un componente, explicadas de la siguiente manera

2020

pca20202$scores
##              PC1
##  [1,] -1.5258931
##  [2,] -5.2563647
##  [3,] -0.4834442
##  [4,] -3.3867482
##  [5,] 35.1607029
##  [6,] -1.4699227
##  [7,] -2.7385370
##  [8,] -5.4149179
##  [9,] -5.3763381
## [10,] 13.2696113
## [11,]  3.4049473
## [12,] -4.0868235
## [13,] -4.8201033
## [14,]  8.2824724
## [15,] -0.2305478
## [16,] -3.2919770
## [17,] -3.0463597
## [18,] -1.7426352
## [19,] -1.2611356
## [20,] -4.9806910
## [21,] -0.1513465
## [22,]  0.4469126
## [23,]  3.4809136
## [24,] -1.4130938
## [25,] -2.9808749
## [26,] -0.2538187
## [27,] -5.2840724
## [28,] -4.3475150
## [29,] -3.8984811
## [30,] -5.1237520
## [31,]  3.1064299
## [32,] -4.1333851
## [33,] 16.3028103
## [34,]  3.4855902
## [35,] -5.4137779
## [36,]  5.1414863
## [37,] -2.2062903
## [38,] -2.8184925
## [39,]  6.2407020
## [40,] -5.1541031
## [41,] -1.7135265
## [42,] -5.4270185
## [43,] -0.4144366
## [44,] 18.4379004
## [45,] -3.8242585
## [46,] -5.6845281
## [47,]  3.5835551
## [48,]  0.1735260
## [49,] -4.3704598
## [50,] -1.1169911
## [51,] -5.6748993

2021

pca20212$scores
##               PC1
##  [1,] -1.53327597
##  [2,] -4.51280311
##  [3,] -0.02019033
##  [4,] -2.94940245
##  [5,] 30.46215897
##  [6,] -0.91464984
##  [7,] -2.57475141
##  [8,] -4.67144483
##  [9,] -4.71328295
## [10,] 13.11522707
## [11,]  3.81496428
## [12,] -3.50507534
## [13,] -4.09328054
## [14,]  7.03596935
## [15,] -0.40159893
## [16,] -2.98238837
## [17,] -2.60799916
## [18,] -1.61605543
## [19,] -1.21403130
## [20,] -4.32002652
## [21,] -0.17423794
## [22,] -0.23961009
## [23,]  2.64771663
## [24,] -1.30336042
## [25,] -2.58233216
## [26,] -0.42770220
## [27,] -4.54503175
## [28,] -3.77546455
## [29,] -2.84370072
## [30,] -4.41089684
## [31,]  2.41782296
## [32,] -3.65924022
## [33,] 12.86670091
## [34,]  3.40910564
## [35,] -4.66653745
## [36,]  3.81121371
## [37,] -1.90446894
## [38,] -2.53875309
## [39,]  4.61200726
## [40,] -4.48423345
## [41,] -1.50262576
## [42,] -4.62776319
## [43,] -0.41892868
## [44,] 16.04524610
## [45,] -3.31479665
## [46,] -4.91349956
## [47,]  3.99093786
## [48,]  0.77103477
## [49,] -3.86692936
## [50,] -1.26216531
## [51,] -4.90757073