Analisis de componentes principales para el archivo “data_pca”
Ahora Normalicemos los datos
data2 <- scale(data_pca[,-16])
View(data2)
Hagamos una prueba de correlacion, para observar si es posible aplicar un analisis de componenetes principales, el resultado de dicha correlacion tiene que ser cercano a cero
det(cor(data2))
## [1] 0.004667778
El resultado obtenido es casi igual a cero por lo que es posible realizar un analisis de componentes principales, ahora analizaremos las cargas de cada componente segun la variable
pca <- princomp(data2)
pca$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## x1 0.141 0.320 0.200 0.604 0.137 0.102 0.199 0.109
## x2 -0.419 0.180 -0.248 0.145 0.236 -0.110 -0.570
## x3 -0.274 0.177 0.259 0.483 -0.131 -0.207 0.367
## x4 -0.307 0.447 -0.223 0.129 -0.167 -0.291 -0.399 -0.152
## x5 -0.236 -0.410 -0.354 0.102 -0.163
## x6 0.132 -0.203 -0.628 0.178 -0.112 0.340
## x7 0.112 0.340 -0.464 -0.345 0.151 -0.194 0.136
## x8 0.385 -0.246 0.153 -0.132 -0.268 0.382 0.424 -0.258
## x9 0.142 0.246 -0.284 -0.510 -0.336 0.157 -0.536
## x10 -0.386 0.110 -0.342 -0.331 0.327 0.273 0.383
## x11 0.137 -0.255 0.122 -0.300 0.528 0.574 -0.364
## x12 0.102 0.330 -0.226 0.346 -0.643 -0.434 -0.209
## x13 0.232 -0.223 -0.648 -0.301 0.246
## x14 0.175 0.410 0.305 -0.182 0.158 0.154 -0.256
## x15 0.521 -0.153 0.135 0.173 -0.256 -0.249 0.151
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## x1 0.134 0.135 0.588
## x2 0.203 -0.186 -0.459
## x3 -0.232 -0.412 0.113 -0.382
## x4 0.576
## x5 -0.582 0.204 0.470
## x6 -0.171 -0.566 0.147
## x7 0.662
## x8 -0.385 -0.355
## x9 -0.367
## x10 0.367 -0.367
## x11 0.237
## x12 0.223
## x13 0.136 0.524
## x14 -0.693 -0.248
## x15 0.344 -0.611
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067
## Cumulative Var 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600
## Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.067 0.067 0.067 0.067 0.067 0.067
## Cumulative Var 0.667 0.733 0.800 0.867 0.933 1.000
Con la siguiente Linea de codigo es posible checar la proporcion de la varianza
summary(pca)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.6220588 1.4501268 1.3332930 1.2434264 1.15529908
## Proportion of Variance 0.1762864 0.1408957 0.1191069 0.1035919 0.08942821
## Cumulative Proportion 0.1762864 0.3171821 0.4362890 0.5398809 0.62930907
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 1.05569426 0.90471763 0.88908929 0.8622762 0.80999883
## Proportion of Variance 0.07467272 0.05484181 0.05296347 0.0498171 0.04395967
## Cumulative Proportion 0.70398179 0.75882360 0.81178707 0.8616042 0.90556384
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## Standard deviation 0.7012045 0.59518243 0.53958339 0.4662240 0.234552581
## Proportion of Variance 0.0329439 0.02373482 0.01950755 0.0145638 0.003686091
## Cumulative Proportion 0.9385077 0.96224255 0.98175010 0.9963139 1.000000000
con esto es posible observar que la mayor proporcion de varianza se encuentra con valores mas altos es con dos componentes, en el tercero esta ya se ve mas reducida. A continuacion revisaremos la varianza y los eigenvalores
fviz_eig(pca,choice = "variance")
fviz_eig(pca, choice = "eigenvalue")
Tras esta ejecucion es posible observar, que efectivamente el componente
uno y el componente dos, aportan mayor varianza
Analisis Grafico
Representaciones Factoriales y su representacion:
fviz_pca_ind(pca,
col.ind = "cos2",
gradient.cols=c("red", "yellow", "green"),
repel = FALSE)
Aqui es posible analizar que el coseno cuadrado indica la proporcion de
la varianza de la variable es explicada por el componente principal,
aquellos valores cercanos a 1, se encuentran bien representados, y de
manera gradual a como va incrementando el color hasta llegar a rojo,
menor representacion tienen las variables, por lo que es posible
explicar que la mayoria de las variables no se encuentran bien
representadas por el componente principal, pero esto depende del grupo
de datos y de su estudio.
Grafico de Cargas Factoriales
fviz_pca_var(pca,
col.var = "contrib",
gradient.cols = c('red', 'yellow', 'green'),
repel = FALSE)
Es posible observar que las variables x10,x2,x3, x8,x15,y x6 representan
un a dimension y las variables x5, x4, x11, x12, x7, x13,x9,x1,x14
representan una dimension totalmente diferente.
Usaremos un biplot para visualizar las puntuaciones
fviz_pca_biplot(pca,
col.var = "red",
col.ind = "black")
En este analisis bidimensional es posible observar que sujetos como el 8 en el cuadrante 4, tienden a darle mas importancia a variables como x15, y menor importancia a las variables como x2, observemos a continuacion
data2[8,]
## x1 x2 x3 x4 x5 x6 x7
## -0.3361816 -2.2846106 -0.9850069 -0.3205148 -0.3698024 1.1582924 0.7202647
## x8 x9 x10 x11 x12 x13 x14
## 0.9307831 1.6138598 -1.5768108 -0.8876292 -0.9577966 0.2812746 -0.3031942
## x15
## 2.3942047
En el cuadrante 2, sujetos como el 90 le dan mas importancia a variables como x13, y menor importancia a variables como x4
data2[90,]
## x1 x2 x3 x4 x5 x6
## 0.77083066 -0.30511876 -0.17814631 -1.08357443 -0.92382349 -0.05815972
## x7 x8 x9 x10 x11 x12
## 1.96482907 -0.08019998 -0.17110473 0.02277015 -0.32292431 0.78273451
## x13 x14 x15
## 2.06467889 1.22680995 0.85446103
Analisis de Proporcion Spss
psych::cor.plot(data2)
Es posible observar que la correlacion entre las variables es muy baja,
ya que la mayoria son negativas entre ellas, aquellas variabes con mas
correlacion entre si son x7, con x9, las variables x1 con x14, ademas de
x15 con x8 por mencionar algunas.
A continuacio aplicaremos rotacion varimax para asi ajustar las cargas de modo que cada variable tenga una carga alta en un solo componente haciendo que la estructura sea mas simple y clara
pca2 <- psych::principal(data2, nfactors = 2, residuals =FALSE, rotate ="varimax",
scores = TRUE, oblique.scores = FALSE, method = "regresion",
use = "pairwise", cor = "cor", weight = NULL)
pca2
## Principal Components Analysis
## Call: psych::principal(r = data2, nfactors = 2, residuals = FALSE,
## rotate = "varimax", scores = TRUE, oblique.scores = FALSE,
## method = "regresion", use = "pairwise", cor = "cor", weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC2 h2 u2 com
## x1 0.04 0.52 0.269 0.73 1.0
## x2 -0.73 0.00 0.533 0.47 1.0
## x3 -0.51 0.08 0.265 0.73 1.0
## x4 0.09 -0.45 0.206 0.79 1.1
## x5 -0.14 -0.70 0.503 0.50 1.1
## x6 0.22 0.03 0.048 0.95 1.0
## x7 -0.01 0.53 0.278 0.72 1.0
## x8 0.71 -0.11 0.519 0.48 1.0
## x9 0.09 0.42 0.181 0.82 1.1
## x10 -0.64 -0.08 0.419 0.58 1.0
## x11 -0.02 0.20 0.042 0.96 1.0
## x12 -0.02 0.51 0.258 0.74 1.0
## x13 0.12 0.12 0.030 0.97 2.0
## x14 0.05 0.66 0.436 0.56 1.0
## x15 0.87 0.10 0.768 0.23 1.0
##
## RC1 RC2
## SS loadings 2.57 2.18
## Proportion Var 0.17 0.15
## Cumulative Var 0.17 0.32
## Proportion Explained 0.54 0.46
## Cumulative Proportion 0.54 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 2 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.14
## with the empirical chi square 770.91 with prob < 1.5e-115
##
## Fit based upon off diagonal values = 0.51
Con ello es posible observar que con dos componentes se explica la maxima varianza en el modelo, podrian usarse hasta 4 componentes principales, pero es mas factible y optima la representacion de unicamente dos componentes, por lo que solo se utilizaran dos componenetes. A continuacion una matriz de coeficientes sobre los dos componentes
pca2$weights[,1]
## x1 x2 x3 x4 x5
## 0.0008960604 -0.2851378248 -0.2012827708 0.0481355724 -0.0332723838
## x6 x7 x8 x9 x10
## 0.0837472648 -0.0202342745 0.2817042861 0.0202814144 -0.2486671202
## x11 x12 x13 x14 x15
## -0.0151416330 -0.0236242067 0.0447827540 -0.0013696313 0.3369550463
pca2$weights[,2]
## x1 x2 x3 x4 x5 x6
## 0.236717095 0.021936576 0.052252404 -0.208155734 -0.315680597 0.008795705
## x7 x8 x9 x10 x11 x12
## 0.243092448 -0.071914925 0.189412732 -0.015124597 0.095096723 0.234550137
## x13 x14 x15
## 0.051813948 0.301661558 0.017665734
Asi bien podemos obtener nuevas variables cuya principal caracteristica es que son ortogonales, este conjunto de variables se redujo unicamente a dos variables, explicadas de la siguiente manera
pca2$scores
## RC1 RC2
## [1,] -1.663181432 -3.47805844
## [2,] -0.363562031 -0.25080118
## [3,] 0.355426863 -0.37948303
## [4,] 0.131447901 0.18641872
## [5,] -1.728595344 0.07626033
## [6,] 0.886144950 -0.74574263
## [7,] -1.791375398 -1.52348776
## [8,] 2.471554555 0.10651225
## [9,] -0.148288881 0.89193562
## [10,] 1.108288252 -1.05150773
## [11,] 0.917048312 0.64596530
## [12,] 2.136851958 2.01301453
## [13,] 0.194033287 -0.81137340
## [14,] -0.869718074 1.23715076
## [15,] 0.044982518 0.47987229
## [16,] 0.084967686 0.01060598
## [17,] 0.793574599 1.89480493
## [18,] -0.119102206 -1.67230798
## [19,] -0.080510505 -0.13905438
## [20,] -0.151593533 -0.34167904
## [21,] 1.509186645 -2.50404963
## [22,] 0.025517491 0.91758816
## [23,] 0.168060496 -0.09080835
## [24,] 0.040314586 -0.06384400
## [25,] 0.105099516 0.32604240
## [26,] -1.141803540 -0.53291409
## [27,] -0.327694558 0.57688400
## [28,] -1.745004932 -2.11925524
## [29,] -1.886388920 1.36596967
## [30,] -1.408426430 0.53267094
## [31,] 2.255198855 -0.01305522
## [32,] -0.953839808 0.61028878
## [33,] 0.462452562 -0.81267787
## [34,] -2.186063854 0.95664069
## [35,] 1.335112083 1.44962001
## [36,] 0.263479483 -1.17372941
## [37,] -1.630151831 -0.26740084
## [38,] 0.187192927 1.77358413
## [39,] 0.259357062 0.24221556
## [40,] -0.264116689 1.50745427
## [41,] -0.649503681 -0.32114967
## [42,] 1.195849679 0.52694049
## [43,] -1.364604928 0.60535110
## [44,] -0.826674665 0.27129448
## [45,] 0.632862315 0.13811486
## [46,] -1.279442465 -0.34897872
## [47,] 0.521722448 1.43964201
## [48,] 2.057561982 -1.56602701
## [49,] -0.694814644 -1.64132533
## [50,] 0.198683933 -0.66074941
## [51,] -0.096684160 0.08420017
## [52,] -0.080792492 -0.76697582
## [53,] 0.523728599 0.79783410
## [54,] -0.881160378 -0.09244674
## [55,] 0.889791101 1.67006207
## [56,] 1.516734928 0.47128242
## [57,] 0.205675362 1.81559398
## [58,] -1.211172109 1.48363719
## [59,] 0.285751751 0.37764667
## [60,] -1.882018623 2.45253675
## [61,] 0.001242478 0.49272459
## [62,] 0.229126806 -0.74076759
## [63,] -0.990237939 -1.88936199
## [64,] 0.712992703 0.27760981
## [65,] 0.403242858 0.08942812
## [66,] 0.008525364 0.80490821
## [67,] -0.500863674 -1.03679961
## [68,] -1.688732565 0.62609716
## [69,] -0.879917608 -0.18706950
## [70,] 0.249449544 1.91471624
## [71,] -0.283464801 -0.03073237
## [72,] -0.091621925 0.44957214
## [73,] 0.861822200 -0.04294464
## [74,] 0.078035665 0.04721447
## [75,] -2.292492422 0.38474563
## [76,] 1.728119167 -0.42831342
## [77,] 0.094645513 -0.07866099
## [78,] 1.021409038 1.20287383
## [79,] 0.264695360 -0.75680623
## [80,] 0.112260616 -0.07658599
## [81,] 1.084519001 -0.09232373
## [82,] -0.532661091 0.25672190
## [83,] 1.791813516 1.26091207
## [84,] 0.473204015 0.81810229
## [85,] -0.684905270 1.32510017
## [86,] 0.940595870 -1.30853778
## [87,] -1.244494813 0.69632648
## [88,] 0.889744074 0.12173111
## [89,] -1.680528953 0.34984533
## [90,] 0.390870811 1.77882636
## [91,] -1.149286816 0.20564591
## [92,] -0.883995431 -0.05331987
## [93,] -0.056006952 -0.05773772
## [94,] 0.622921653 0.20797725
## [95,] 1.837861719 -0.71191950
## [96,] 1.146132246 -0.98789143
## [97,] 0.831304077 -0.65399469
## [98,] -0.167267711 -0.09652808
## [99,] 0.157844745 0.57471758
## [100,] 1.289681991 0.33736814
## [101,] -0.009518897 -0.66442164
## [102,] -1.131694227 0.10420478
## [103,] 0.227820260 -1.77320271
## [104,] 0.343636435 0.28669628
## [105,] -1.015254442 0.49414549
## [106,] 1.227695664 -1.13583579
## [107,] -0.653524974 0.42617784
## [108,] 0.935352727 -2.19226831
## [109,] -0.945359579 0.75278169
## [110,] 0.798493248 0.59527668
## [111,] -0.182434415 -0.18049814
## [112,] -0.108828086 1.36744887
## [113,] 0.068642652 -1.90638751
## [114,] -0.505808483 0.17625129
## [115,] -1.313357721 -1.42694362
## [116,] -1.379364286 -2.12241890
## [117,] 0.555141327 1.98190981
## [118,] -0.667412652 0.91860021
## [119,] 1.337478042 -0.39756339
## [120,] -1.656553170 0.25584312
## [121,] -1.977257823 0.41552427
## [122,] 1.343502445 0.14766768
## [123,] -0.540823678 -0.96358806
## [124,] 0.625136393 -0.57698845
## [125,] 0.718927880 1.62673405
## [126,] -0.195155692 0.89523542
## [127,] 0.910692513 -1.41201180
## [128,] -0.540040515 -2.07462041
## [129,] 0.600264251 -0.68426890
## [130,] 0.177979990 -0.82184466
## [131,] -0.839385964 -0.75379213
## [132,] -0.588186259 0.63876238
## [133,] -1.024527343 0.16243794
## [134,] -1.389595773 1.45835963
## [135,] 0.667635654 -0.39040295
## [136,] -0.056720053 -0.55261685
## [137,] -1.362855606 -0.78420294
## [138,] 1.337345287 0.80777575
## [139,] -0.366397759 -0.51005677
## [140,] 0.729524091 -1.13077035
## [141,] 0.681572460 -0.22331639
## [142,] -0.278730217 1.31442187
## [143,] 1.471062711 -0.47770100
## [144,] -0.132713631 0.34929955
## [145,] -0.496178664 -0.33851983
## [146,] 0.600822286 -0.78493424
## [147,] 0.272593112 0.06594810
## [148,] -1.931720581 -1.85024125
## [149,] -0.350702532 0.85353899
## [150,] -0.374593439 2.01477013
## [151,] 0.068601982 -0.60043400
## [152,] 0.645537289 -1.94493688
## [153,] -0.626378015 -0.21433197
## [154,] 1.481035203 -0.99665310
## [155,] 2.207929960 0.12640557
## [156,] 0.381041005 0.09880233
## [157,] 0.390060709 -0.33821528
## [158,] 1.363127868 1.53326647
## [159,] -0.178529026 -0.63247144
## [160,] -0.430415472 1.27189265
## [161,] -1.830687715 -0.07070883
## [162,] -0.844544670 -1.39959915
## [163,] -0.365501252 0.09688615
## [164,] 0.827537887 -0.31323932
## [165,] 0.094443091 0.66355454
## [166,] 1.734208120 0.18287931
## [167,] 1.101412087 0.10001821
## [168,] 0.139050937 0.97362543
## [169,] -1.270278461 0.58877299
## [170,] 0.989258008 -0.84731891
## [171,] -1.185035259 0.98452974
## [172,] -0.966644026 -0.08551493
## [173,] 0.055322471 -1.15963929
## [174,] 0.185883835 -1.01999761
## [175,] 1.376528781 -0.05288940
## [176,] 0.879527355 -0.30554160
## [177,] -0.025087450 0.86804996
## [178,] 1.501287798 -0.41856491
## [179,] 0.446654247 0.48227162
## [180,] -2.502882395 -0.34160096
## [181,] -0.719805613 1.22757385
## [182,] 0.513419818 0.34717255
## [183,] 1.488350553 -0.65111277
## [184,] 0.688111149 0.86961481
## [185,] -0.248192785 -0.04273693
## [186,] -0.774447598 -0.33076244
## [187,] 0.213469367 0.14410373
## [188,] -0.451083343 -0.50725241
## [189,] 0.759350447 2.18536730
## [190,] 0.832397164 0.14776045
## [191,] 0.119632381 0.53035188
## [192,] -0.225847994 -0.95650810
## [193,] -0.180965925 -1.27041253
## [194,] 0.374494531 -1.40665842
## [195,] 0.238388497 0.69270164
## [196,] -1.276731027 1.06037324
## [197,] 0.236446028 -0.88610798
## [198,] 0.742677578 -0.91861134
## [199,] -2.084771954 -0.63576583
## [200,] 0.063069234 -0.36840522
Base de Datos “Poblacion USA (2020)”
library(readxl)
## Warning: package 'readxl' was built under R version 4.4.3
PoblacionUSA <- read_excel("C:/Users/Ines Kimberly/Downloads/PoblacionUSA.xlsm")
View(PoblacionUSA)
A continuacion normalizaremos unicamente los datos correspondientes al año 2020 y 2021
Hagamos una prueba de correlacion, para observar si es posible aplicar un analisis de componenetes principales, el resultado de dicha correlacion tiene que ser cercano a cero
det(cor(data2020))
## [1] -8.80948e-41
det(cor(data2021))
## [1] -4.747386e-25
El resultado es practicamente cero para amboa años, por lo que es factible utilizar el pca Ahora realizaremos un analisis del factor de adecuacion Muestral de Kayser-Mayer
psych::KMO(data2020)
## Error in solve.default(r) :
## sistema es computacionalmente singular: número de condición recíproco = 3.40445e-18
## matrix is not invertible, image not found
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = data2020)
## Overall MSA = 0.5
## MSA for each item =
## Census Resident Total Population - AB:Qr-1-2020
## 0.5
## Resident Total Population Estimate - Jul-1-2020
## 0.5
## Net Domestic Migration - Jul-1-2020
## 0.5
## Federal/Civilian Movement from Abroad - Jul-1-2020
## 0.5
## Net International Migration - Jul-1-2020
## 0.5
## Period Births - Jul-1-2020
## 0.5
## Period Deaths - Jul-1-2020
## 0.5
## Resident Under 65 Population Estimate - Jul-1-2020
## 0.5
## Resident 65 Plus Population Estimate - Jul-1-2020
## 0.5
## Residual - Jul-1-2020
## 0.5
psych::KMO(data2021)
## Error in solve.default(r) :
## sistema es computacionalmente singular: número de condición recíproco = 4.27828e-18
## matrix is not invertible, image not found
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = data2021)
## Overall MSA = 0.5
## MSA for each item =
## Resident Total Population Estimate - Jul-1-2021
## 0.5
## Net Domestic Migration - Jul-1-2021
## 0.5
## Federal/Civilian Movement from Abroad - Jul-1-2021
## 0.5
## Net International Migration - Jul-1-2021
## 0.5
## Period Births - Jul-1-2021
## 0.5
## Period Deaths - Jul-1-2021
## 0.5
## Resident Under 65 Population Estimate - Jul-1-2021
## 0.5
## Resident 65 Plus Population Estimate - Jul-1-2021
## 0.5
## Residual - Jul-1-2021
## 0.5
Para ambos años presenta un valor igual a 0.5, lo que nos indica un valor regular de KMO Ahora analizaremos las cargas de cada componente segun la variable
pca2020 <- princomp(data2020)
pca2020$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4
## Census Resident Total Population - AB:Qr-1-2020 0.367
## Resident Total Population Estimate - Jul-1-2020 0.367
## Net Domestic Migration - Jul-1-2020 -0.660 0.669 0.256
## Federal/Civilian Movement from Abroad - Jul-1-2020 0.278 -0.343 0.124 -0.799
## Net International Migration - Jul-1-2020 0.347 -0.102 -0.209
## Period Births - Jul-1-2020 0.365
## Period Deaths - Jul-1-2020 0.355 0.122 0.292
## Resident Under 65 Population Estimate - Jul-1-2020 0.367
## Resident 65 Plus Population Estimate - Jul-1-2020 0.354 0.149 0.288
## Residual - Jul-1-2020 -0.654 -0.698 0.255
## Comp.5 Comp.6 Comp.7 Comp.8
## Census Resident Total Population - AB:Qr-1-2020 0.191 0.349
## Resident Total Population Estimate - Jul-1-2020 0.193 0.346
## Net Domestic Migration - Jul-1-2020 -0.176 0.107
## Federal/Civilian Movement from Abroad - Jul-1-2020 0.323 -0.214
## Net International Migration - Jul-1-2020 -0.878 -0.143 0.170
## Period Births - Jul-1-2020 0.547 -0.666 -0.347
## Period Deaths - Jul-1-2020 0.266 -0.323 -0.397 0.658
## Resident Under 65 Population Estimate - Jul-1-2020 0.299 0.386 0.118
## Resident 65 Plus Population Estimate - Jul-1-2020 -0.590 -0.633
## Residual - Jul-1-2020
## Comp.9 Comp.10
## Census Resident Total Population - AB:Qr-1-2020 0.807 0.203
## Resident Total Population Estimate - Jul-1-2020 -0.540 0.634
## Net Domestic Migration - Jul-1-2020
## Federal/Civilian Movement from Abroad - Jul-1-2020
## Net International Migration - Jul-1-2020
## Period Births - Jul-1-2020
## Period Deaths - Jul-1-2020
## Resident Under 65 Population Estimate - Jul-1-2020 -0.237 -0.740
## Resident 65 Plus Population Estimate - Jul-1-2020 -0.101
## Residual - Jul-1-2020
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
## Proportion Var 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
## Cumulative Var 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
## Comp.10
## SS loadings 1.0
## Proportion Var 0.1
## Cumulative Var 1.0
pca2021 <- princomp(data2021)
pca2021$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4
## Resident Total Population Estimate - Jul-1-2021 0.392
## Net Domestic Migration - Jul-1-2021 -0.690 0.585 0.260
## Federal/Civilian Movement from Abroad - Jul-1-2021 -0.299 0.292 0.528 -0.698
## Net International Migration - Jul-1-2021 0.373 -0.239 -0.242
## Period Births - Jul-1-2021 0.391
## Period Deaths - Jul-1-2021 0.379 0.328
## Resident Under 65 Population Estimate - Jul-1-2021 0.392
## Resident 65 Plus Population Estimate - Jul-1-2021 0.380 0.335
## Residual - Jul-1-2021 0.150 -0.639 -0.296 -0.617
## Comp.5 Comp.6 Comp.7 Comp.8
## Resident Total Population Estimate - Jul-1-2021 0.217 0.475
## Net Domestic Migration - Jul-1-2021 -0.324
## Federal/Civilian Movement from Abroad - Jul-1-2021 -0.148 0.192
## Net International Migration - Jul-1-2021 -0.783 -0.299 -0.108 0.174
## Period Births - Jul-1-2021 0.627 -0.595 -0.309
## Period Deaths - Jul-1-2021 0.367 -0.225 -0.350 0.657
## Resident Under 65 Population Estimate - Jul-1-2021 0.317 0.534 0.123
## Resident 65 Plus Population Estimate - Jul-1-2021 0.156 -0.526 -0.651
## Residual - Jul-1-2021 0.315
## Comp.9
## Resident Total Population Estimate - Jul-1-2021 0.746
## Net Domestic Migration - Jul-1-2021
## Federal/Civilian Movement from Abroad - Jul-1-2021
## Net International Migration - Jul-1-2021
## Period Births - Jul-1-2021
## Period Deaths - Jul-1-2021
## Resident Under 65 Population Estimate - Jul-1-2021 -0.660
## Resident 65 Plus Population Estimate - Jul-1-2021
## Residual - Jul-1-2021
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.111 0.111 0.111 0.111 0.111 0.111 0.111 0.111 0.111
## Cumulative Var 0.111 0.222 0.333 0.444 0.556 0.667 0.778 0.889 1.000
Con la siguiente Linea de codigo es posible checar la proporcion de la varianza
summary(pca2020)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.6907367 1.1607536 0.8280683 0.6125685 0.33421800
## Proportion of Variance 0.7384865 0.1374296 0.0699411 0.0382745 0.01139357
## Cumulative Proportion 0.7384865 0.8759161 0.9458572 0.9841317 0.99552529
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 0.204077430 0.0391252568 2.629360e-02 0 0
## Proportion of Variance 0.004248055 0.0001561401 7.051803e-05 0 0
## Cumulative Proportion 0.999773342 0.9999294820 1.000000e+00 1 1
Para 2020 es posible observar que con un solo componente es mas que suficciente ya que ahi se encuentra explicado un 74% de la varianza,se tendria que seguir analizando para ver si es recomendable usar otro componente mas.
summary(pca2021)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.5056381 1.2841609 0.66974585 0.55374490 0.31095905
## Proportion of Variance 0.7115319 0.1868945 0.05083674 0.03475179 0.01095883
## Cumulative Proportion 0.7115319 0.8984264 0.94926311 0.98401490 0.99497372
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.205937451 0.0345666035 2.728378e-02 0
## Proportion of Variance 0.004806493 0.0001354163 8.436586e-05 0
## Cumulative Proportion 0.999780218 0.9999156341 1.000000e+00 1
De manera similar para 2021, tambien parece ser suficiente con un solo componente y que con ese se explica aproximadamenteel 70% de la varianza, se tendria que seguir analizando para ver si es recomendable usar otro componente mas. A continuacion revisaremos la varianza y los eigenvalores para 2020
fviz_eig(pca2020,choice = "variance")
fviz_eig(pca2020, choice = "eigenvalue")
Con este grafico es posible observar que la mayoria de la varianza se
eencuentra explicada por un solo componente principal Ahora para
2021
fviz_eig(pca2021,choice = "variance")
fviz_eig(pca2021, choice = "eigenvalue")
Para este año tambien es osible observarq ue la mayoria de la varianza
se explica de manera sufieciente con un solo componente.
Analisis Grafico
Representaciones Factoriales y su representacion: 2020
fviz_pca_ind(pca2020,
col.ind = "cos2",
gradient.cols=c("red", "yellow", "green"),
repel = FALSE)
Aqui es posible analizar que el coseno cuadrado indica la proporcion de
la varianza de la variable es explicada por el componente principal,
aquellos valores cercanos a 1, se encuentran bien representados, y de
manera gradual a como va incrementando el color hasta llegar a rojo,
menor representacion tienen las variables, por lo que es posible
explicar que la mayoria de las variables se encuentran bien
representadas por el componente principal, a excepcion de el numero 2,
el 48, y el 26, pero son minoria, asi que no afectan a nuestro
estudio.
2021
fviz_pca_ind(pca2021,
col.ind = "cos2",
gradient.cols=c("red", "yellow", "green"),
repel = FALSE)
Para el año 2021, es posible observar que el unico dato qu no se explica correctamente es el dato 2, per solo es un dato en todo el onjunto de datos, por lo que carece de importancia. Grafico de Cargas Factoriales 2020
fviz_pca_var(pca2020,
col.var = "contrib",
gradient.cols = c('red', 'yellow', 'green'),
repel = FALSE)
Es posible observar que las variables se pueden agrupar en dos
dimensiones, donde una esta conformada por Migracion, Residual y Federal
para el mes de julio y otra dimension para los demas.
2021
fviz_pca_var(pca2021,
col.var = "contrib",
gradient.cols = c('red', 'yellow', 'green'),
repel = FALSE)
Para el año 2021 se divide en tres dimensiones, donde una esta
conformada por Residual y Net domestic migration, otra dimension esta
conformada por net from abroad, y la otra por las demas.
Usaremos un biplot para visualizar las puntuaciones 2020
fviz_pca_biplot(pca2020,
col.var = "red",
col.ind = "black")
Es posible observar que sujetos como el 11 tienden a tener mas
preferencia por Residual en comparacion con otras variables, o el sujeto
10 se eencuentra mas inclinado por movement from abroad
data2020[11,]
## Census Resident Total Population - AB:Qr-1-2020
## 0.43289508
## Resident Total Population Estimate - Jul-1-2020
## 0.43626956
## Net Domestic Migration - Jul-1-2020
## 1.81333937
## Federal/Civilian Movement from Abroad - Jul-1-2020
## 1.13149799
## Net International Migration - Jul-1-2020
## 0.02083766
## Period Births - Jul-1-2020
## 0.58378211
## Period Deaths - Jul-1-2020
## 0.33145645
## Resident Under 65 Population Estimate - Jul-1-2020
## 0.47531658
## Resident 65 Plus Population Estimate - Jul-1-2020
## 0.13600669
## Residual - Jul-1-2020
## 2.78992635
2021
fviz_pca_biplot(pca2021,
col.var = "red",
col.ind = "black")
Es posible observar que sujetos como el 6 tienden a tener mas preferencia por residual en comparacion con otras variables, o el sujeto 48 se encuentra mas inclinado por net domestic migration
data2021[6,]
## Resident Total Population Estimate - Jul-1-2021
## -0.1862544
## Net Domestic Migration - Jul-1-2021
## 0.8179269
## Federal/Civilian Movement from Abroad - Jul-1-2021
## -0.1681316
## Net International Migration - Jul-1-2021
## -0.1141287
## Period Births - Jul-1-2021
## -0.1599211
## Period Deaths - Jul-1-2021
## -0.4113328
## Resident Under 65 Population Estimate - Jul-1-2021
## -0.1627187
## Resident 65 Plus Population Estimate - Jul-1-2021
## -0.3540924
## Residual - Jul-1-2021
## 1.0247546
data2021[48,]
## Resident Total Population Estimate - Jul-1-2021
## 0.064459658
## Net Domestic Migration - Jul-1-2021
## 0.437039714
## Federal/Civilian Movement from Abroad - Jul-1-2021
## -0.772697762
## Net International Migration - Jul-1-2021
## 0.026219017
## Period Births - Jul-1-2021
## 0.009235824
## Period Deaths - Jul-1-2021
## -0.062897824
## Resident Under 65 Population Estimate - Jul-1-2021
## 0.076784090
## Resident 65 Plus Population Estimate - Jul-1-2021
## -0.028443946
## Residual - Jul-1-2021
## 0.381199726
Analisis de Proporcion Spss
psych::cor.plot(data2020)
Es posible observar que la mayoria de las correlacones entre los
componentes es fuerte ya que son muy cercanos a 1
2021
psych::cor.plot(data2021)
Es posible observar que la mayoria de las correlacones entre los componentes es fuerte ya que son muy cercanos a 1
A continuacion aplicaremos rotacion varimax para asi ajustar las cargas de modo que cada variable tenga una carga alta en un solo componente haciendo que la estructura sea mas simple y clara 2020
pca20202 <- psych::principal(data2020, nfactors = 1, residuals =FALSE, rotate ="varimax",
scores = TRUE, oblique.scores = FALSE, method = "regresion",
use = "pairwise", cor = "cor", weight = NULL)
## Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done
## In factor.stats, I could not find the RMSEA upper bound . Sorry about that
## Warning in psych::principal(data2020, nfactors = 1, residuals = FALSE, rotate =
## "varimax", : The matrix is not positive semi-definite, scores found from
## Structure loadings
pca20202
## Principal Components Analysis
## Call: psych::principal(r = data2020, nfactors = 1, residuals = FALSE,
## rotate = "varimax", scores = TRUE, oblique.scores = FALSE,
## method = "regresion", use = "pairwise", cor = "cor", weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## Census Resident Total Population - AB:Qr-1-2020 1.00 0.992 0.0080 1
## Resident Total Population Estimate - Jul-1-2020 1.00 0.992 0.0078 1
## Net Domestic Migration - Jul-1-2020 -0.24 0.059 0.9408 1
## Federal/Civilian Movement from Abroad - Jul-1-2020 0.75 0.569 0.4312 1
## Net International Migration - Jul-1-2020 0.94 0.887 0.1130 1
## Period Births - Jul-1-2020 0.99 0.985 0.0147 1
## Period Deaths - Jul-1-2020 0.97 0.933 0.0668 1
## Resident Under 65 Population Estimate - Jul-1-2020 1.00 0.993 0.0074 1
## Resident 65 Plus Population Estimate - Jul-1-2020 0.96 0.928 0.0721 1
## Residual - Jul-1-2020 0.22 0.047 0.9532 1
##
## PC1
## SS loadings 7.38
## Proportion Var 0.74
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.08
## with the empirical chi square 27.37 with prob < 0.82
##
## Fit based upon off diagonal values = 0.99
2021
pca20212 <- psych::principal(data2021, nfactors = 1, residuals =FALSE, rotate ="varimax",
scores = TRUE, oblique.scores = FALSE, method = "regresion",
use = "pairwise", cor = "cor", weight = NULL)
## Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done
## Warning in psych::principal(data2021, nfactors = 1, residuals = FALSE, rotate =
## "varimax", : The matrix is not positive semi-definite, scores found from
## Structure loadings
pca20212
## Principal Components Analysis
## Call: psych::principal(r = data2021, nfactors = 1, residuals = FALSE,
## rotate = "varimax", scores = TRUE, oblique.scores = FALSE,
## method = "regresion", use = "pairwise", cor = "cor", weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
## PC1 h2 u2 com
## Resident Total Population Estimate - Jul-1-2021 0.99 0.98 0.016 1
## Net Domestic Migration - Jul-1-2021 -0.10 0.01 0.990 1
## Federal/Civilian Movement from Abroad - Jul-1-2021 -0.76 0.57 0.427 1
## Net International Migration - Jul-1-2021 0.94 0.89 0.111 1
## Period Births - Jul-1-2021 0.99 0.98 0.022 1
## Period Deaths - Jul-1-2021 0.96 0.92 0.081 1
## Resident Under 65 Population Estimate - Jul-1-2021 0.99 0.98 0.016 1
## Resident 65 Plus Population Estimate - Jul-1-2021 0.96 0.92 0.076 1
## Residual - Jul-1-2021 0.38 0.14 0.857 1
##
## PC1
## SS loadings 6.40
## Proportion Var 0.71
##
## Mean item complexity = 1
## Test of the hypothesis that 1 component is sufficient.
##
## The root mean square of the residuals (RMSR) is 0.13
## with the empirical chi square 62.55 with prob < 0.00012
##
## Fit based upon off diagonal values = 0.97
Con ello es posible observar que para ambos años con un solo componente se explica la maxima varianza en el modelo, podrian usarse mas componentes principales, pero es mas factible y optima la representacion de unicamente un componente, por lo que solo se utilizara un componenete para cada año. A continuacion una matriz de coeficientes sobre el componente 2020
pca20202$weights[,1]
## Census Resident Total Population - AB:Qr-1-2020
## 0.9960094
## Resident Total Population Estimate - Jul-1-2020
## 0.9961092
## Net Domestic Migration - Jul-1-2020
## -0.2433636
## Federal/Civilian Movement from Abroad - Jul-1-2020
## 0.7541575
## Net International Migration - Jul-1-2020
## 0.9417826
## Period Births - Jul-1-2020
## 0.9925980
## Period Deaths - Jul-1-2020
## 0.9659996
## Resident Under 65 Population Estimate - Jul-1-2020
## 0.9962822
## Resident 65 Plus Population Estimate - Jul-1-2020
## 0.9632815
## Residual - Jul-1-2020
## 0.2162589
2021
pca20212$weights[,1]
## Resident Total Population Estimate - Jul-1-2021
## 0.9917747
## Net Domestic Migration - Jul-1-2021
## -0.1016653
## Federal/Civilian Movement from Abroad - Jul-1-2021
## -0.7569791
## Net International Migration - Jul-1-2021
## 0.9430850
## Period Births - Jul-1-2021
## 0.9888870
## Period Deaths - Jul-1-2021
## 0.9584075
## Resident Under 65 Population Estimate - Jul-1-2021
## 0.9917319
## Resident 65 Plus Population Estimate - Jul-1-2021
## 0.9613842
## Residual - Jul-1-2021
## 0.3783821
Asi bien podemos obtener nuevas variables para ambos años cuya principal caracteristica es que son ortogonales, este conjunto de variables se redujo unicamente a un componente, explicadas de la siguiente manera
2020
pca20202$scores
## PC1
## [1,] -1.5258931
## [2,] -5.2563647
## [3,] -0.4834442
## [4,] -3.3867482
## [5,] 35.1607029
## [6,] -1.4699227
## [7,] -2.7385370
## [8,] -5.4149179
## [9,] -5.3763381
## [10,] 13.2696113
## [11,] 3.4049473
## [12,] -4.0868235
## [13,] -4.8201033
## [14,] 8.2824724
## [15,] -0.2305478
## [16,] -3.2919770
## [17,] -3.0463597
## [18,] -1.7426352
## [19,] -1.2611356
## [20,] -4.9806910
## [21,] -0.1513465
## [22,] 0.4469126
## [23,] 3.4809136
## [24,] -1.4130938
## [25,] -2.9808749
## [26,] -0.2538187
## [27,] -5.2840724
## [28,] -4.3475150
## [29,] -3.8984811
## [30,] -5.1237520
## [31,] 3.1064299
## [32,] -4.1333851
## [33,] 16.3028103
## [34,] 3.4855902
## [35,] -5.4137779
## [36,] 5.1414863
## [37,] -2.2062903
## [38,] -2.8184925
## [39,] 6.2407020
## [40,] -5.1541031
## [41,] -1.7135265
## [42,] -5.4270185
## [43,] -0.4144366
## [44,] 18.4379004
## [45,] -3.8242585
## [46,] -5.6845281
## [47,] 3.5835551
## [48,] 0.1735260
## [49,] -4.3704598
## [50,] -1.1169911
## [51,] -5.6748993
2021
pca20212$scores
## PC1
## [1,] -1.53327597
## [2,] -4.51280311
## [3,] -0.02019033
## [4,] -2.94940245
## [5,] 30.46215897
## [6,] -0.91464984
## [7,] -2.57475141
## [8,] -4.67144483
## [9,] -4.71328295
## [10,] 13.11522707
## [11,] 3.81496428
## [12,] -3.50507534
## [13,] -4.09328054
## [14,] 7.03596935
## [15,] -0.40159893
## [16,] -2.98238837
## [17,] -2.60799916
## [18,] -1.61605543
## [19,] -1.21403130
## [20,] -4.32002652
## [21,] -0.17423794
## [22,] -0.23961009
## [23,] 2.64771663
## [24,] -1.30336042
## [25,] -2.58233216
## [26,] -0.42770220
## [27,] -4.54503175
## [28,] -3.77546455
## [29,] -2.84370072
## [30,] -4.41089684
## [31,] 2.41782296
## [32,] -3.65924022
## [33,] 12.86670091
## [34,] 3.40910564
## [35,] -4.66653745
## [36,] 3.81121371
## [37,] -1.90446894
## [38,] -2.53875309
## [39,] 4.61200726
## [40,] -4.48423345
## [41,] -1.50262576
## [42,] -4.62776319
## [43,] -0.41892868
## [44,] 16.04524610
## [45,] -3.31479665
## [46,] -4.91349956
## [47,] 3.99093786
## [48,] 0.77103477
## [49,] -3.86692936
## [50,] -1.26216531
## [51,] -4.90757073