This study performs a Principle Components Analysis on a collection of sixteen Gallup polling variables to discover any patterns in the data and find smaller groups of homogeneous items from a larger group of data.
Principal component analysis (PCA) allows us to summarize the variations (information) in a data set described by multiple variables. Each variable could be considered as a different dimension. If you have more than three variables in your data set, it could be very difficult to visualize a multi-dimensional hyperspace.
The goal of PCA is to transform the initial variables into a new set of variables which explain the variation in the data. These new variables corresponds to a linear combination of the originals and are called principal components.
PCA reduces the dimensionality of multivariate data, to two or three that can be visualized graphically with minimal loss of information.
Several functions from different packages are available in R for performing PCA, e.g., prcomp and princomp (built-in R stats package) and PCA in the FactoMineR package.
This study will demonstrate how to perform a principal component analysis using R software and the FactoMineR package. It will also show how to visualize the output of the PCA using the R package factoextra.
The Gallup polling variables represent monthly public opinion and attitude polling data sent from the Gallup Organiztion, e.g., Economic Confidence, Presidential Approval, Economy Good, etc. The data contains polling data taken from a date range of January 2004 through July 2017.
The PCA study will be conducted using FactoMineR (Husson et al.),one of the most powerful R packages and a good choice for performing a multivariate exploratory data analysis.
FactoMineR can be installed and loaded as follows:
setwd("C:/Users/Michael/Desktop/R Code and Data/data/gallup")
# ipak function: install and load multiple R packages.
# check to see if packages are installed. Install them if they are not,
# then load them into the R session.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
# list of packages required for this analysis
packages <- c("FactoMineR", ##
"FactoInvestigate", ##
"missMDA", ##
"corrplot",
"PerformanceAnalytics",
"readr", ## ** File processing **
"plyr", ## ** Data wrangling **
"dplyr", ## ** Data wrangling **
"reshape2", ## ** Data wrangling **
"knitr", ## ** Reproducibility **
"devtools", ## ** R package tools **
"DT", ## ** DataTables **
"xtable", ## ** Export tables to HTML **
"pander", ## ** Pandoc writer for R **
"Hmisc", ## ** Data cleansing (imputation) **
"ggplot2") ## ** Plotting and Visualization **
ipak(packages)
## FactoMineR FactoInvestigate missMDA
## TRUE TRUE TRUE
## corrplot PerformanceAnalytics readr
## TRUE TRUE TRUE
## plyr dplyr reshape2
## TRUE TRUE TRUE
## knitr devtools DT
## TRUE TRUE TRUE
## xtable pander Hmisc
## TRUE TRUE TRUE
## ggplot2
## TRUE
# Load it
library(FactoMineR)
The package factoextra has flexible methods for the classes PCA, prcomp, princomp, and dudi in order to quickly extract and visualize the results of the analysis. The ggplot2 plotting system is used for the data visualization.
Install and load package factoextra as follows:
# Install and load factoextra package
library("devtools")
install_github("kassambara/factoextra")
# Load it
library("factoextra")
# read csv flat file containing summarize Gallup polling data (1/2004-7/2017):
gallup_vars <-
read_csv("gallup_all_vars.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
## EffDate = col_character(),
## CongressionalApproval = col_double(),
## CongressionalDisapproval = col_double(),
## GOPApproval = col_double(),
## IndependentPartyApproval = col_double(),
## DemocraticPartyIdentification = col_double(),
## GoodTimeFindQualityJob = col_double(),
## BadTimeFindQualityJob = col_double(),
## PresidentialApproval = col_double(),
## EconIssuesMostImpProblem = col_double(),
## EconomyExcellent = col_double(),
## EconomyGood = col_double(),
## EconomyFair = col_double(),
## EconomyPoor = col_double(),
## NationalEconomyGettingBetter = col_double(),
## NationalEconomyGettingWorse = col_double(),
## EconomicConfidenceIndex = col_integer()
## )
# examine first few rows of data
head(gallup_vars[, 1:6])
## # A tibble: 6 x 6
## EffDate CongressionalApproval CongressionalDisapproval GOPApproval
## <chr> <dbl> <dbl> <dbl>
## 1 1/1/2004 0.48 0.45 0.32
## 2 2/1/2004 0.41 0.51 0.31
## 3 3/1/2004 0.42 0.51 0.33
## 4 4/1/2004 0.43 0.51 0.33
## 5 5/1/2004 0.41 0.52 0.34
## 6 6/1/2004 0.41 0.52 0.32
## # ... with 2 more variables: IndependentPartyApproval <dbl>,
## # DemocraticPartyIdentification <dbl>
# examine data stats and look for missing values
summary(gallup_vars)
## EffDate CongressionalApproval CongressionalDisapproval
## Length:163 Min. :0.0900 Min. :0.4500
## Class :character 1st Qu.:0.1600 1st Qu.:0.6500
## Mode :character Median :0.2000 Median :0.7400
## Mean :0.2233 Mean :0.7158
## 3rd Qu.:0.2700 3rd Qu.:0.7950
## Max. :0.4800 Max. :0.8600
## GOPApproval IndependentPartyApproval DemocraticPartyIdentification
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.2638 1st Qu.:0.3500 1st Qu.:0.3020
## Median :0.2730 Median :0.3750 Median :0.3150
## Mean :0.2811 Mean :0.3679 Mean :0.3185
## 3rd Qu.:0.2890 3rd Qu.:0.3970 3rd Qu.:0.3400
## Max. :0.3800 Max. :0.4350 Max. :0.3850
## GoodTimeFindQualityJob BadTimeFindQualityJob PresidentialApproval
## Min. :0.0800 Min. :0.3600 Min. :0.2500
## 1st Qu.:0.1950 1st Qu.:0.5500 1st Qu.:0.4100
## Median :0.3400 Median :0.6100 Median :0.4600
## Mean :0.3028 Mean :0.6596 Mean :0.4472
## 3rd Qu.:0.4100 3rd Qu.:0.7650 3rd Qu.:0.4943
## Max. :0.5800 Max. :0.9000 Max. :0.6600
## EconIssuesMostImpProblem EconomyExcellent EconomyGood
## Min. :0.0800 Min. :0.01000 Min. :0.0600
## 1st Qu.:0.3100 1st Qu.:0.01000 1st Qu.:0.1250
## Median :0.4100 Median :0.02000 Median :0.2000
## Mean :0.4637 Mean :0.02693 Mean :0.2067
## 3rd Qu.:0.6400 3rd Qu.:0.03000 3rd Qu.:0.2800
## Max. :0.8600 Max. :0.11000 Max. :0.4100
## EconomyFair EconomyPoor NationalEconomyGettingBetter
## Min. :0.3000 Min. :0.1500 Min. :0.0900
## 1st Qu.:0.4200 1st Qu.:0.2300 1st Qu.:0.3000
## Median :0.4400 Median :0.3100 Median :0.3800
## Mean :0.4312 Mean :0.3317 Mean :0.3544
## 3rd Qu.:0.4500 3rd Qu.:0.4250 3rd Qu.:0.4200
## Max. :0.4900 Max. :0.6400 Max. :0.6600
## NationalEconomyGettingWorse EconomicConfidenceIndex
## Min. :0.2700 Min. :-65.00
## 1st Qu.:0.5200 1st Qu.:-27.00
## Median :0.5600 Median :-15.00
## Mean :0.5823 Mean :-16.44
## 3rd Qu.:0.6200 3rd Qu.: -4.50
## Max. :0.8800 Max. : 33.00
# Extract only active rows (years) and variables for the PCA:
# Active years (rows 1:23 - years that are used during the PCA [2004-2017].
# Active variables (columns 2:17) - Sixteen Gallup variables used for the PCA.
colnames(gallup_vars)
## [1] "EffDate" "CongressionalApproval"
## [3] "CongressionalDisapproval" "GOPApproval"
## [5] "IndependentPartyApproval" "DemocraticPartyIdentification"
## [7] "GoodTimeFindQualityJob" "BadTimeFindQualityJob"
## [9] "PresidentialApproval" "EconIssuesMostImpProblem"
## [11] "EconomyExcellent" "EconomyGood"
## [13] "EconomyFair" "EconomyPoor"
## [15] "NationalEconomyGettingBetter" "NationalEconomyGettingWorse"
## [17] "EconomicConfidenceIndex"
gallup_vars.active <- gallup_vars[, 2:17]
head(gallup_vars.active[, 1:6])
## # A tibble: 6 x 6
## CongressionalApproval CongressionalDisapproval GOPApproval
## <dbl> <dbl> <dbl>
## 1 0.48 0.45 0.32
## 2 0.41 0.51 0.31
## 3 0.42 0.51 0.33
## 4 0.43 0.51 0.33
## 5 0.41 0.52 0.34
## 6 0.41 0.52 0.32
## # ... with 3 more variables: IndependentPartyApproval <dbl>,
## # DemocraticPartyIdentification <dbl>, GoodTimeFindQualityJob <dbl>
Before performing PCA, we perform some exploratory data analysis such as descriptive statistics, correlation matrix and scatter plot matrix.
gallup_vars.active_stats <- data.frame(
Min = apply(gallup_vars.active, 2, min), # minimum
Q1 = apply(gallup_vars.active, 2, quantile, 1/4), # First quartile
Med = apply(gallup_vars.active, 2, median), # median
Mean = apply(gallup_vars.active, 2, mean), # mean
Q3 = apply(gallup_vars.active, 2, quantile, 3/4), # Third quartile
Max = apply(gallup_vars.active, 2, max) # Maximum
)
gallup_vars.active_stats <- round(gallup_vars.active_stats, 1)
head(gallup_vars.active_stats)
## Min Q1 Med Mean Q3 Max
## CongressionalApproval 0.1 0.2 0.2 0.2 0.3 0.5
## CongressionalDisapproval 0.4 0.6 0.7 0.7 0.8 0.9
## GOPApproval 0.0 0.3 0.3 0.3 0.3 0.4
## IndependentPartyApproval 0.0 0.4 0.4 0.4 0.4 0.4
## DemocraticPartyIdentification 0.0 0.3 0.3 0.3 0.3 0.4
## GoodTimeFindQualityJob 0.1 0.2 0.3 0.3 0.4 0.6
Note that, you can also use the built-in R function summary() for the descriptive statistics but the format of the output on a data frame can be hard to read.
The correlation between variables can be calculated as follows:
cor.mat <- round(cor(gallup_vars.active, method="pearson"),2)
head(cor.mat[, 1:6])
## CongressionalApproval
## CongressionalApproval 1.00
## CongressionalDisapproval -0.99
## GOPApproval 0.59
## IndependentPartyApproval -0.55
## DemocraticPartyIdentification 0.34
## GoodTimeFindQualityJob 0.19
## CongressionalDisapproval GOPApproval
## CongressionalApproval -0.99 0.59
## CongressionalDisapproval 1.00 -0.56
## GOPApproval -0.56 1.00
## IndependentPartyApproval 0.56 -0.14
## DemocraticPartyIdentification -0.34 0.60
## GoodTimeFindQualityJob -0.19 0.19
## IndependentPartyApproval
## CongressionalApproval -0.55
## CongressionalDisapproval 0.56
## GOPApproval -0.14
## IndependentPartyApproval 1.00
## DemocraticPartyIdentification 0.07
## GoodTimeFindQualityJob -0.11
## DemocraticPartyIdentification
## CongressionalApproval 0.34
## CongressionalDisapproval -0.34
## GOPApproval 0.60
## IndependentPartyApproval 0.07
## DemocraticPartyIdentification 1.00
## GoodTimeFindQualityJob 0.03
## GoodTimeFindQualityJob
## CongressionalApproval 0.19
## CongressionalDisapproval -0.19
## GOPApproval 0.19
## IndependentPartyApproval -0.11
## DemocraticPartyIdentification 0.03
## GoodTimeFindQualityJob 1.00
The function rcorr() in the Hmisc package can be used to compute the significance levels for Pearson and Spearman correlations. It returns both the correlation coefficients and the p-value of the correlation for all possible pairs of columns in the data table.
library(Hmisc)
rcorr.mat <- rcorr(as.matrix(gallup_vars.active))
head(rcorr.mat,1)
## $r
## CongressionalApproval
## CongressionalApproval 1.0000000
## CongressionalDisapproval -0.9875522
## GOPApproval 0.5855030
## IndependentPartyApproval -0.5464272
## DemocraticPartyIdentification 0.3443272
## GoodTimeFindQualityJob 0.1869320
## BadTimeFindQualityJob -0.1854677
## PresidentialApproval 0.2225860
## EconIssuesMostImpProblem -0.2743874
## EconomyExcellent 0.3031937
## EconomyGood 0.5650913
## EconomyFair -0.1641691
## EconomyPoor -0.4221181
## NationalEconomyGettingBetter 0.1682397
## NationalEconomyGettingWorse -0.2966925
## EconomicConfidenceIndex 0.4299047
## CongressionalDisapproval GOPApproval
## CongressionalApproval -0.9875522 0.58550298
## CongressionalDisapproval 1.0000000 -0.56007004
## GOPApproval -0.5600700 1.00000000
## IndependentPartyApproval 0.5643029 -0.13592984
## DemocraticPartyIdentification -0.3373878 0.60066056
## GoodTimeFindQualityJob -0.1895186 0.19385727
## BadTimeFindQualityJob 0.1899981 -0.17974387
## PresidentialApproval -0.1746795 0.07124973
## EconIssuesMostImpProblem 0.2632108 -0.33824196
## EconomyExcellent -0.3122056 0.28591835
## EconomyGood -0.5574511 0.53546351
## EconomyFair 0.1801822 0.05031211
## EconomyPoor 0.4143568 -0.46309128
## NationalEconomyGettingBetter -0.1203981 0.15808661
## NationalEconomyGettingWorse 0.2514772 -0.26916093
## EconomicConfidenceIndex -0.3983937 0.42258760
## IndependentPartyApproval
## CongressionalApproval -0.54642725
## CongressionalDisapproval 0.56430292
## GOPApproval -0.13592984
## IndependentPartyApproval 1.00000000
## DemocraticPartyIdentification 0.07082908
## GoodTimeFindQualityJob -0.10914844
## BadTimeFindQualityJob 0.11957784
## PresidentialApproval -0.19479504
## EconIssuesMostImpProblem 0.12851901
## EconomyExcellent -0.20641293
## EconomyGood -0.34615344
## EconomyFair 0.12156745
## EconomyPoor 0.26416707
## NationalEconomyGettingBetter -0.15463102
## NationalEconomyGettingWorse 0.23454432
## EconomicConfidenceIndex -0.29717180
## DemocraticPartyIdentification
## CongressionalApproval 0.34432718
## CongressionalDisapproval -0.33738780
## GOPApproval 0.60066056
## IndependentPartyApproval 0.07082908
## DemocraticPartyIdentification 1.00000000
## GoodTimeFindQualityJob 0.02684049
## BadTimeFindQualityJob -0.03971621
## PresidentialApproval -0.02656408
## EconIssuesMostImpProblem -0.03842990
## EconomyExcellent 0.20025048
## EconomyGood 0.13251662
## EconomyFair -0.32990655
## EconomyPoor -0.03046459
## NationalEconomyGettingBetter -0.22954443
## NationalEconomyGettingWorse 0.15830640
## EconomicConfidenceIndex -0.06126272
## GoodTimeFindQualityJob BadTimeFindQualityJob
## CongressionalApproval 0.18693204 -0.18546768
## CongressionalDisapproval -0.18951865 0.18999811
## GOPApproval 0.19385727 -0.17974387
## IndependentPartyApproval -0.10914844 0.11957784
## DemocraticPartyIdentification 0.02684049 -0.03971621
## GoodTimeFindQualityJob 1.00000000 -0.99634165
## BadTimeFindQualityJob -0.99634165 1.00000000
## PresidentialApproval -0.32575768 0.34725454
## EconIssuesMostImpProblem -0.87116969 0.85677624
## EconomyExcellent 0.68063915 -0.68506318
## EconomyGood 0.79419756 -0.78634655
## EconomyFair 0.30673668 -0.28007206
## EconomyPoor -0.79958552 0.78620100
## NationalEconomyGettingBetter 0.09783312 -0.06698129
## NationalEconomyGettingWorse -0.16194813 0.13182797
## EconomicConfidenceIndex 0.56908137 -0.54517692
## PresidentialApproval
## CongressionalApproval 0.22258599
## CongressionalDisapproval -0.17467946
## GOPApproval 0.07124973
## IndependentPartyApproval -0.19479504
## DemocraticPartyIdentification -0.02656408
## GoodTimeFindQualityJob -0.32575768
## BadTimeFindQualityJob 0.34725454
## PresidentialApproval 1.00000000
## EconIssuesMostImpProblem 0.28078634
## EconomyExcellent -0.38563430
## EconomyGood -0.19438310
## EconomyFair 0.01218046
## EconomyPoor 0.20409261
## NationalEconomyGettingBetter 0.52043772
## NationalEconomyGettingWorse -0.48030654
## EconomicConfidenceIndex 0.16162841
## EconIssuesMostImpProblem EconomyExcellent
## CongressionalApproval -0.2743874 0.30319369
## CongressionalDisapproval 0.2632108 -0.31220561
## GOPApproval -0.3382420 0.28591835
## IndependentPartyApproval 0.1285190 -0.20641293
## DemocraticPartyIdentification -0.0384299 0.20025048
## GoodTimeFindQualityJob -0.8711697 0.68063915
## BadTimeFindQualityJob 0.8567762 -0.68506318
## PresidentialApproval 0.2807863 -0.38563430
## EconIssuesMostImpProblem 1.0000000 -0.76395047
## EconomyExcellent -0.7639505 1.00000000
## EconomyGood -0.8653511 0.77358299
## EconomyFair -0.4181415 0.05637205
## EconomyPoor 0.9003488 -0.75651586
## NationalEconomyGettingBetter -0.2671433 0.09074272
## NationalEconomyGettingWorse 0.3568509 -0.20934449
## EconomicConfidenceIndex -0.7301230 0.57749361
## EconomyGood EconomyFair EconomyPoor
## CongressionalApproval 0.5650913 -0.16416907 -0.42211813
## CongressionalDisapproval -0.5574511 0.18018225 0.41435677
## GOPApproval 0.5354635 0.05031211 -0.46309128
## IndependentPartyApproval -0.3461534 0.12156745 0.26416707
## DemocraticPartyIdentification 0.1325166 -0.32990655 -0.03046459
## GoodTimeFindQualityJob 0.7941976 0.30673668 -0.79958552
## BadTimeFindQualityJob -0.7863466 -0.28007206 0.78620100
## PresidentialApproval -0.1943831 0.01218046 0.20409261
## EconIssuesMostImpProblem -0.8653511 -0.41814154 0.90034884
## EconomyExcellent 0.7735830 0.05637205 -0.75651586
## EconomyGood 1.0000000 0.26472679 -0.95670152
## EconomyFair 0.2647268 1.00000000 -0.51319015
## EconomyPoor -0.9567015 -0.51319015 1.00000000
## NationalEconomyGettingBetter 0.3032012 0.51364684 -0.40204749
## NationalEconomyGettingWorse -0.4294280 -0.50058699 0.51290995
## EconomicConfidenceIndex 0.8139009 0.52972317 -0.86882377
## NationalEconomyGettingBetter
## CongressionalApproval 0.16823968
## CongressionalDisapproval -0.12039809
## GOPApproval 0.15808661
## IndependentPartyApproval -0.15463102
## DemocraticPartyIdentification -0.22954443
## GoodTimeFindQualityJob 0.09783312
## BadTimeFindQualityJob -0.06698129
## PresidentialApproval 0.52043772
## EconIssuesMostImpProblem -0.26714334
## EconomyExcellent 0.09074272
## EconomyGood 0.30320123
## EconomyFair 0.51364684
## EconomyPoor -0.40204749
## NationalEconomyGettingBetter 1.00000000
## NationalEconomyGettingWorse -0.97982675
## EconomicConfidenceIndex 0.79125065
## NationalEconomyGettingWorse
## CongressionalApproval -0.2966925
## CongressionalDisapproval 0.2514772
## GOPApproval -0.2691609
## IndependentPartyApproval 0.2345443
## DemocraticPartyIdentification 0.1583064
## GoodTimeFindQualityJob -0.1619481
## BadTimeFindQualityJob 0.1318280
## PresidentialApproval -0.4803065
## EconIssuesMostImpProblem 0.3568509
## EconomyExcellent -0.2093445
## EconomyGood -0.4294280
## EconomyFair -0.5005870
## EconomyPoor 0.5129099
## NationalEconomyGettingBetter -0.9798267
## NationalEconomyGettingWorse 1.0000000
## EconomicConfidenceIndex -0.8645731
## EconomicConfidenceIndex
## CongressionalApproval 0.42990467
## CongressionalDisapproval -0.39839372
## GOPApproval 0.42258760
## IndependentPartyApproval -0.29717180
## DemocraticPartyIdentification -0.06126272
## GoodTimeFindQualityJob 0.56908137
## BadTimeFindQualityJob -0.54517692
## PresidentialApproval 0.16162841
## EconIssuesMostImpProblem -0.73012298
## EconomyExcellent 0.57749361
## EconomyGood 0.81390089
## EconomyFair 0.52972317
## EconomyPoor -0.86882377
## NationalEconomyGettingBetter 0.79125065
## NationalEconomyGettingWorse -0.86457312
## EconomicConfidenceIndex 1.00000000
# Extract the correlation coefficients
rcorr.mat$r
## CongressionalApproval
## CongressionalApproval 1.0000000
## CongressionalDisapproval -0.9875522
## GOPApproval 0.5855030
## IndependentPartyApproval -0.5464272
## DemocraticPartyIdentification 0.3443272
## GoodTimeFindQualityJob 0.1869320
## BadTimeFindQualityJob -0.1854677
## PresidentialApproval 0.2225860
## EconIssuesMostImpProblem -0.2743874
## EconomyExcellent 0.3031937
## EconomyGood 0.5650913
## EconomyFair -0.1641691
## EconomyPoor -0.4221181
## NationalEconomyGettingBetter 0.1682397
## NationalEconomyGettingWorse -0.2966925
## EconomicConfidenceIndex 0.4299047
## CongressionalDisapproval GOPApproval
## CongressionalApproval -0.9875522 0.58550298
## CongressionalDisapproval 1.0000000 -0.56007004
## GOPApproval -0.5600700 1.00000000
## IndependentPartyApproval 0.5643029 -0.13592984
## DemocraticPartyIdentification -0.3373878 0.60066056
## GoodTimeFindQualityJob -0.1895186 0.19385727
## BadTimeFindQualityJob 0.1899981 -0.17974387
## PresidentialApproval -0.1746795 0.07124973
## EconIssuesMostImpProblem 0.2632108 -0.33824196
## EconomyExcellent -0.3122056 0.28591835
## EconomyGood -0.5574511 0.53546351
## EconomyFair 0.1801822 0.05031211
## EconomyPoor 0.4143568 -0.46309128
## NationalEconomyGettingBetter -0.1203981 0.15808661
## NationalEconomyGettingWorse 0.2514772 -0.26916093
## EconomicConfidenceIndex -0.3983937 0.42258760
## IndependentPartyApproval
## CongressionalApproval -0.54642725
## CongressionalDisapproval 0.56430292
## GOPApproval -0.13592984
## IndependentPartyApproval 1.00000000
## DemocraticPartyIdentification 0.07082908
## GoodTimeFindQualityJob -0.10914844
## BadTimeFindQualityJob 0.11957784
## PresidentialApproval -0.19479504
## EconIssuesMostImpProblem 0.12851901
## EconomyExcellent -0.20641293
## EconomyGood -0.34615344
## EconomyFair 0.12156745
## EconomyPoor 0.26416707
## NationalEconomyGettingBetter -0.15463102
## NationalEconomyGettingWorse 0.23454432
## EconomicConfidenceIndex -0.29717180
## DemocraticPartyIdentification
## CongressionalApproval 0.34432718
## CongressionalDisapproval -0.33738780
## GOPApproval 0.60066056
## IndependentPartyApproval 0.07082908
## DemocraticPartyIdentification 1.00000000
## GoodTimeFindQualityJob 0.02684049
## BadTimeFindQualityJob -0.03971621
## PresidentialApproval -0.02656408
## EconIssuesMostImpProblem -0.03842990
## EconomyExcellent 0.20025048
## EconomyGood 0.13251662
## EconomyFair -0.32990655
## EconomyPoor -0.03046459
## NationalEconomyGettingBetter -0.22954443
## NationalEconomyGettingWorse 0.15830640
## EconomicConfidenceIndex -0.06126272
## GoodTimeFindQualityJob BadTimeFindQualityJob
## CongressionalApproval 0.18693204 -0.18546768
## CongressionalDisapproval -0.18951865 0.18999811
## GOPApproval 0.19385727 -0.17974387
## IndependentPartyApproval -0.10914844 0.11957784
## DemocraticPartyIdentification 0.02684049 -0.03971621
## GoodTimeFindQualityJob 1.00000000 -0.99634165
## BadTimeFindQualityJob -0.99634165 1.00000000
## PresidentialApproval -0.32575768 0.34725454
## EconIssuesMostImpProblem -0.87116969 0.85677624
## EconomyExcellent 0.68063915 -0.68506318
## EconomyGood 0.79419756 -0.78634655
## EconomyFair 0.30673668 -0.28007206
## EconomyPoor -0.79958552 0.78620100
## NationalEconomyGettingBetter 0.09783312 -0.06698129
## NationalEconomyGettingWorse -0.16194813 0.13182797
## EconomicConfidenceIndex 0.56908137 -0.54517692
## PresidentialApproval
## CongressionalApproval 0.22258599
## CongressionalDisapproval -0.17467946
## GOPApproval 0.07124973
## IndependentPartyApproval -0.19479504
## DemocraticPartyIdentification -0.02656408
## GoodTimeFindQualityJob -0.32575768
## BadTimeFindQualityJob 0.34725454
## PresidentialApproval 1.00000000
## EconIssuesMostImpProblem 0.28078634
## EconomyExcellent -0.38563430
## EconomyGood -0.19438310
## EconomyFair 0.01218046
## EconomyPoor 0.20409261
## NationalEconomyGettingBetter 0.52043772
## NationalEconomyGettingWorse -0.48030654
## EconomicConfidenceIndex 0.16162841
## EconIssuesMostImpProblem EconomyExcellent
## CongressionalApproval -0.2743874 0.30319369
## CongressionalDisapproval 0.2632108 -0.31220561
## GOPApproval -0.3382420 0.28591835
## IndependentPartyApproval 0.1285190 -0.20641293
## DemocraticPartyIdentification -0.0384299 0.20025048
## GoodTimeFindQualityJob -0.8711697 0.68063915
## BadTimeFindQualityJob 0.8567762 -0.68506318
## PresidentialApproval 0.2807863 -0.38563430
## EconIssuesMostImpProblem 1.0000000 -0.76395047
## EconomyExcellent -0.7639505 1.00000000
## EconomyGood -0.8653511 0.77358299
## EconomyFair -0.4181415 0.05637205
## EconomyPoor 0.9003488 -0.75651586
## NationalEconomyGettingBetter -0.2671433 0.09074272
## NationalEconomyGettingWorse 0.3568509 -0.20934449
## EconomicConfidenceIndex -0.7301230 0.57749361
## EconomyGood EconomyFair EconomyPoor
## CongressionalApproval 0.5650913 -0.16416907 -0.42211813
## CongressionalDisapproval -0.5574511 0.18018225 0.41435677
## GOPApproval 0.5354635 0.05031211 -0.46309128
## IndependentPartyApproval -0.3461534 0.12156745 0.26416707
## DemocraticPartyIdentification 0.1325166 -0.32990655 -0.03046459
## GoodTimeFindQualityJob 0.7941976 0.30673668 -0.79958552
## BadTimeFindQualityJob -0.7863466 -0.28007206 0.78620100
## PresidentialApproval -0.1943831 0.01218046 0.20409261
## EconIssuesMostImpProblem -0.8653511 -0.41814154 0.90034884
## EconomyExcellent 0.7735830 0.05637205 -0.75651586
## EconomyGood 1.0000000 0.26472679 -0.95670152
## EconomyFair 0.2647268 1.00000000 -0.51319015
## EconomyPoor -0.9567015 -0.51319015 1.00000000
## NationalEconomyGettingBetter 0.3032012 0.51364684 -0.40204749
## NationalEconomyGettingWorse -0.4294280 -0.50058699 0.51290995
## EconomicConfidenceIndex 0.8139009 0.52972317 -0.86882377
## NationalEconomyGettingBetter
## CongressionalApproval 0.16823968
## CongressionalDisapproval -0.12039809
## GOPApproval 0.15808661
## IndependentPartyApproval -0.15463102
## DemocraticPartyIdentification -0.22954443
## GoodTimeFindQualityJob 0.09783312
## BadTimeFindQualityJob -0.06698129
## PresidentialApproval 0.52043772
## EconIssuesMostImpProblem -0.26714334
## EconomyExcellent 0.09074272
## EconomyGood 0.30320123
## EconomyFair 0.51364684
## EconomyPoor -0.40204749
## NationalEconomyGettingBetter 1.00000000
## NationalEconomyGettingWorse -0.97982675
## EconomicConfidenceIndex 0.79125065
## NationalEconomyGettingWorse
## CongressionalApproval -0.2966925
## CongressionalDisapproval 0.2514772
## GOPApproval -0.2691609
## IndependentPartyApproval 0.2345443
## DemocraticPartyIdentification 0.1583064
## GoodTimeFindQualityJob -0.1619481
## BadTimeFindQualityJob 0.1318280
## PresidentialApproval -0.4803065
## EconIssuesMostImpProblem 0.3568509
## EconomyExcellent -0.2093445
## EconomyGood -0.4294280
## EconomyFair -0.5005870
## EconomyPoor 0.5129099
## NationalEconomyGettingBetter -0.9798267
## NationalEconomyGettingWorse 1.0000000
## EconomicConfidenceIndex -0.8645731
## EconomicConfidenceIndex
## CongressionalApproval 0.42990467
## CongressionalDisapproval -0.39839372
## GOPApproval 0.42258760
## IndependentPartyApproval -0.29717180
## DemocraticPartyIdentification -0.06126272
## GoodTimeFindQualityJob 0.56908137
## BadTimeFindQualityJob -0.54517692
## PresidentialApproval 0.16162841
## EconIssuesMostImpProblem -0.73012298
## EconomyExcellent 0.57749361
## EconomyGood 0.81390089
## EconomyFair 0.52972317
## EconomyPoor -0.86882377
## NationalEconomyGettingBetter 0.79125065
## NationalEconomyGettingWorse -0.86457312
## EconomicConfidenceIndex 1.00000000
# Extract the p-values
rcorr.mat$P
## CongressionalApproval
## CongressionalApproval NA
## CongressionalDisapproval 0.000000e+00
## GOPApproval 2.220446e-16
## IndependentPartyApproval 4.529710e-14
## DemocraticPartyIdentification 6.774719e-06
## GoodTimeFindQualityJob 1.687949e-02
## BadTimeFindQualityJob 1.777501e-02
## PresidentialApproval 4.292395e-03
## EconIssuesMostImpProblem 3.934164e-04
## EconomyExcellent 8.349522e-05
## EconomyGood 3.996803e-15
## EconomyFair 3.625379e-02
## EconomyPoor 1.996437e-08
## NationalEconomyGettingBetter 3.181431e-02
## NationalEconomyGettingWorse 1.201975e-04
## EconomicConfidenceIndex 1.021924e-08
## CongressionalDisapproval GOPApproval
## CongressionalApproval 0.000000e+00 2.220446e-16
## CongressionalDisapproval NA 7.549517e-15
## GOPApproval 7.549517e-15 NA
## IndependentPartyApproval 4.440892e-15 8.360889e-02
## DemocraticPartyIdentification 1.061928e-05 0.000000e+00
## GoodTimeFindQualityJob 1.539328e-02 1.315576e-02
## BadTimeFindQualityJob 1.513070e-02 2.168236e-02
## PresidentialApproval 2.573616e-02 3.660991e-01
## EconIssuesMostImpProblem 6.871645e-04 1.005354e-05
## EconomyExcellent 4.966781e-05 2.157619e-04
## EconomyGood 1.088019e-14 1.778577e-13
## EconomyFair 2.135890e-02 5.236021e-01
## EconomyPoor 3.827491e-08 4.833394e-10
## NationalEconomyGettingBetter 1.257979e-01 4.385502e-02
## NationalEconomyGettingWorse 1.203130e-03 5.121617e-04
## EconomicConfidenceIndex 1.387460e-07 1.918349e-08
## IndependentPartyApproval
## CongressionalApproval 4.529710e-14
## CongressionalDisapproval 4.440892e-15
## GOPApproval 8.360889e-02
## IndependentPartyApproval NA
## DemocraticPartyIdentification 3.689431e-01
## GoodTimeFindQualityJob 1.654610e-01
## BadTimeFindQualityJob 1.284157e-01
## PresidentialApproval 1.271141e-02
## EconIssuesMostImpProblem 1.020531e-01
## EconomyExcellent 8.204077e-03
## EconomyGood 6.008127e-06
## EconomyFair 1.221374e-01
## EconomyPoor 6.557562e-04
## NationalEconomyGettingBetter 4.873828e-02
## NationalEconomyGettingWorse 2.581949e-03
## EconomicConfidenceIndex 1.170463e-04
## DemocraticPartyIdentification
## CongressionalApproval 6.774719e-06
## CongressionalDisapproval 1.061928e-05
## GOPApproval 0.000000e+00
## IndependentPartyApproval 3.689431e-01
## DemocraticPartyIdentification NA
## GoodTimeFindQualityJob 7.337808e-01
## BadTimeFindQualityJob 6.147119e-01
## PresidentialApproval 7.364208e-01
## EconIssuesMostImpProblem 6.262270e-01
## EconomyExcellent 1.037823e-02
## EconomyGood 9.174116e-02
## EconomyFair 1.703537e-05
## EconomyPoor 6.994654e-01
## NationalEconomyGettingBetter 3.203189e-03
## NationalEconomyGettingWorse 4.355878e-02
## EconomicConfidenceIndex 4.372410e-01
## GoodTimeFindQualityJob BadTimeFindQualityJob
## CongressionalApproval 1.687949e-02 1.777501e-02
## CongressionalDisapproval 1.539328e-02 1.513070e-02
## GOPApproval 1.315576e-02 2.168236e-02
## IndependentPartyApproval 1.654610e-01 1.284157e-01
## DemocraticPartyIdentification 7.337808e-01 6.147119e-01
## GoodTimeFindQualityJob NA 0.000000e+00
## BadTimeFindQualityJob 0.000000e+00 NA
## PresidentialApproval 2.202381e-05 5.586459e-06
## EconIssuesMostImpProblem 0.000000e+00 0.000000e+00
## EconomyExcellent 0.000000e+00 0.000000e+00
## EconomyGood 0.000000e+00 0.000000e+00
## EconomyFair 6.820951e-05 2.935308e-04
## EconomyPoor 0.000000e+00 0.000000e+00
## NationalEconomyGettingBetter 2.140817e-01 3.955846e-01
## NationalEconomyGettingWorse 3.888920e-02 9.345588e-02
## EconomicConfidenceIndex 2.220446e-15 5.284662e-14
## PresidentialApproval
## CongressionalApproval 4.292395e-03
## CongressionalDisapproval 2.573616e-02
## GOPApproval 3.660991e-01
## IndependentPartyApproval 1.271141e-02
## DemocraticPartyIdentification 7.364208e-01
## GoodTimeFindQualityJob 2.202381e-05
## BadTimeFindQualityJob 5.586459e-06
## PresidentialApproval NA
## EconIssuesMostImpProblem 2.827989e-04
## EconomyExcellent 3.703808e-07
## EconomyGood 1.290495e-02
## EconomyFair 8.773585e-01
## EconomyPoor 8.970099e-03
## NationalEconomyGettingBetter 1.074252e-12
## NationalEconomyGettingWorse 8.686030e-11
## EconomicConfidenceIndex 3.928154e-02
## EconIssuesMostImpProblem EconomyExcellent
## CongressionalApproval 3.934164e-04 8.349522e-05
## CongressionalDisapproval 6.871645e-04 4.966781e-05
## GOPApproval 1.005354e-05 2.157619e-04
## IndependentPartyApproval 1.020531e-01 8.204077e-03
## DemocraticPartyIdentification 6.262270e-01 1.037823e-02
## GoodTimeFindQualityJob 0.000000e+00 0.000000e+00
## BadTimeFindQualityJob 0.000000e+00 0.000000e+00
## PresidentialApproval 2.827989e-04 3.703808e-07
## EconIssuesMostImpProblem NA 0.000000e+00
## EconomyExcellent 0.000000e+00 NA
## EconomyGood 0.000000e+00 0.000000e+00
## EconomyFair 2.792352e-08 4.747698e-01
## EconomyPoor 0.000000e+00 0.000000e+00
## NationalEconomyGettingBetter 5.662657e-04 2.493260e-01
## NationalEconomyGettingWorse 2.928352e-06 7.319513e-03
## EconomicConfidenceIndex 0.000000e+00 6.661338e-16
## EconomyGood EconomyFair EconomyPoor
## CongressionalApproval 3.996803e-15 3.625379e-02 1.996437e-08
## CongressionalDisapproval 1.088019e-14 2.135890e-02 3.827491e-08
## GOPApproval 1.778577e-13 5.236021e-01 4.833394e-10
## IndependentPartyApproval 6.008127e-06 1.221374e-01 6.557562e-04
## DemocraticPartyIdentification 9.174116e-02 1.703537e-05 6.994654e-01
## GoodTimeFindQualityJob 0.000000e+00 6.820951e-05 0.000000e+00
## BadTimeFindQualityJob 0.000000e+00 2.935308e-04 0.000000e+00
## PresidentialApproval 1.290495e-02 8.773585e-01 8.970099e-03
## EconIssuesMostImpProblem 0.000000e+00 2.792352e-08 0.000000e+00
## EconomyExcellent 0.000000e+00 4.747698e-01 0.000000e+00
## EconomyGood NA 6.379910e-04 0.000000e+00
## EconomyFair 6.379910e-04 NA 2.478906e-12
## EconomyPoor 0.000000e+00 2.478906e-12 NA
## NationalEconomyGettingBetter 8.345952e-05 2.353229e-12 1.039372e-07
## NationalEconomyGettingWorse 1.065208e-08 1.013412e-11 2.559286e-12
## EconomicConfidenceIndex 0.000000e+00 3.572698e-13 0.000000e+00
## NationalEconomyGettingBetter
## CongressionalApproval 3.181431e-02
## CongressionalDisapproval 1.257979e-01
## GOPApproval 4.385502e-02
## IndependentPartyApproval 4.873828e-02
## DemocraticPartyIdentification 3.203189e-03
## GoodTimeFindQualityJob 2.140817e-01
## BadTimeFindQualityJob 3.955846e-01
## PresidentialApproval 1.074252e-12
## EconIssuesMostImpProblem 5.662657e-04
## EconomyExcellent 2.493260e-01
## EconomyGood 8.345952e-05
## EconomyFair 2.353229e-12
## EconomyPoor 1.039372e-07
## NationalEconomyGettingBetter NA
## NationalEconomyGettingWorse 0.000000e+00
## EconomicConfidenceIndex 0.000000e+00
## NationalEconomyGettingWorse
## CongressionalApproval 1.201975e-04
## CongressionalDisapproval 1.203130e-03
## GOPApproval 5.121617e-04
## IndependentPartyApproval 2.581949e-03
## DemocraticPartyIdentification 4.355878e-02
## GoodTimeFindQualityJob 3.888920e-02
## BadTimeFindQualityJob 9.345588e-02
## PresidentialApproval 8.686030e-11
## EconIssuesMostImpProblem 2.928352e-06
## EconomyExcellent 7.319513e-03
## EconomyGood 1.065208e-08
## EconomyFair 1.013412e-11
## EconomyPoor 2.559286e-12
## NationalEconomyGettingBetter 0.000000e+00
## NationalEconomyGettingWorse NA
## EconomicConfidenceIndex 0.000000e+00
## EconomicConfidenceIndex
## CongressionalApproval 1.021924e-08
## CongressionalDisapproval 1.387460e-07
## GOPApproval 1.918349e-08
## IndependentPartyApproval 1.170463e-04
## DemocraticPartyIdentification 4.372410e-01
## GoodTimeFindQualityJob 2.220446e-15
## BadTimeFindQualityJob 5.284662e-14
## PresidentialApproval 3.928154e-02
## EconIssuesMostImpProblem 0.000000e+00
## EconomyExcellent 6.661338e-16
## EconomyGood 0.000000e+00
## EconomyFair 3.572698e-13
## EconomyPoor 0.000000e+00
## NationalEconomyGettingBetter 0.000000e+00
## NationalEconomyGettingWorse 0.000000e+00
## EconomicConfidenceIndex NA
The output of the function rcorr() is a list containing the following elements: + r: the correlation matrix + n: the matrix of the number of observations used in analyzing each pair of variables + P: the p-values corresponding to the significance levels of correlations.
The flattenCorrMatrix function will format the correlation matrix into a table of four columns: row names, column names, the correlation coefficient between each variable and the others, and the p-values.
library(Hmisc)
source('C:/Users/Michael/Desktop/R Code and Data/data/gallup/flattenCorrMatrix.R')
res2 <- rcorr(as.matrix(gallup_vars.active))
flattenCorrMatrix(res2$r, res2$P)
## row column
## 1 CongressionalApproval CongressionalDisapproval
## 2 CongressionalApproval GOPApproval
## 3 CongressionalDisapproval GOPApproval
## 4 CongressionalApproval IndependentPartyApproval
## 5 CongressionalDisapproval IndependentPartyApproval
## 6 GOPApproval IndependentPartyApproval
## 7 CongressionalApproval DemocraticPartyIdentification
## 8 CongressionalDisapproval DemocraticPartyIdentification
## 9 GOPApproval DemocraticPartyIdentification
## 10 IndependentPartyApproval DemocraticPartyIdentification
## 11 CongressionalApproval GoodTimeFindQualityJob
## 12 CongressionalDisapproval GoodTimeFindQualityJob
## 13 GOPApproval GoodTimeFindQualityJob
## 14 IndependentPartyApproval GoodTimeFindQualityJob
## 15 DemocraticPartyIdentification GoodTimeFindQualityJob
## 16 CongressionalApproval BadTimeFindQualityJob
## 17 CongressionalDisapproval BadTimeFindQualityJob
## 18 GOPApproval BadTimeFindQualityJob
## 19 IndependentPartyApproval BadTimeFindQualityJob
## 20 DemocraticPartyIdentification BadTimeFindQualityJob
## 21 GoodTimeFindQualityJob BadTimeFindQualityJob
## 22 CongressionalApproval PresidentialApproval
## 23 CongressionalDisapproval PresidentialApproval
## 24 GOPApproval PresidentialApproval
## 25 IndependentPartyApproval PresidentialApproval
## 26 DemocraticPartyIdentification PresidentialApproval
## 27 GoodTimeFindQualityJob PresidentialApproval
## 28 BadTimeFindQualityJob PresidentialApproval
## 29 CongressionalApproval EconIssuesMostImpProblem
## 30 CongressionalDisapproval EconIssuesMostImpProblem
## 31 GOPApproval EconIssuesMostImpProblem
## 32 IndependentPartyApproval EconIssuesMostImpProblem
## 33 DemocraticPartyIdentification EconIssuesMostImpProblem
## 34 GoodTimeFindQualityJob EconIssuesMostImpProblem
## 35 BadTimeFindQualityJob EconIssuesMostImpProblem
## 36 PresidentialApproval EconIssuesMostImpProblem
## 37 CongressionalApproval EconomyExcellent
## 38 CongressionalDisapproval EconomyExcellent
## 39 GOPApproval EconomyExcellent
## 40 IndependentPartyApproval EconomyExcellent
## 41 DemocraticPartyIdentification EconomyExcellent
## 42 GoodTimeFindQualityJob EconomyExcellent
## 43 BadTimeFindQualityJob EconomyExcellent
## 44 PresidentialApproval EconomyExcellent
## 45 EconIssuesMostImpProblem EconomyExcellent
## 46 CongressionalApproval EconomyGood
## 47 CongressionalDisapproval EconomyGood
## 48 GOPApproval EconomyGood
## 49 IndependentPartyApproval EconomyGood
## 50 DemocraticPartyIdentification EconomyGood
## 51 GoodTimeFindQualityJob EconomyGood
## 52 BadTimeFindQualityJob EconomyGood
## 53 PresidentialApproval EconomyGood
## 54 EconIssuesMostImpProblem EconomyGood
## 55 EconomyExcellent EconomyGood
## 56 CongressionalApproval EconomyFair
## 57 CongressionalDisapproval EconomyFair
## 58 GOPApproval EconomyFair
## 59 IndependentPartyApproval EconomyFair
## 60 DemocraticPartyIdentification EconomyFair
## 61 GoodTimeFindQualityJob EconomyFair
## 62 BadTimeFindQualityJob EconomyFair
## 63 PresidentialApproval EconomyFair
## 64 EconIssuesMostImpProblem EconomyFair
## 65 EconomyExcellent EconomyFair
## 66 EconomyGood EconomyFair
## 67 CongressionalApproval EconomyPoor
## 68 CongressionalDisapproval EconomyPoor
## 69 GOPApproval EconomyPoor
## 70 IndependentPartyApproval EconomyPoor
## 71 DemocraticPartyIdentification EconomyPoor
## 72 GoodTimeFindQualityJob EconomyPoor
## 73 BadTimeFindQualityJob EconomyPoor
## 74 PresidentialApproval EconomyPoor
## 75 EconIssuesMostImpProblem EconomyPoor
## 76 EconomyExcellent EconomyPoor
## 77 EconomyGood EconomyPoor
## 78 EconomyFair EconomyPoor
## 79 CongressionalApproval NationalEconomyGettingBetter
## 80 CongressionalDisapproval NationalEconomyGettingBetter
## 81 GOPApproval NationalEconomyGettingBetter
## 82 IndependentPartyApproval NationalEconomyGettingBetter
## 83 DemocraticPartyIdentification NationalEconomyGettingBetter
## 84 GoodTimeFindQualityJob NationalEconomyGettingBetter
## 85 BadTimeFindQualityJob NationalEconomyGettingBetter
## 86 PresidentialApproval NationalEconomyGettingBetter
## 87 EconIssuesMostImpProblem NationalEconomyGettingBetter
## 88 EconomyExcellent NationalEconomyGettingBetter
## 89 EconomyGood NationalEconomyGettingBetter
## 90 EconomyFair NationalEconomyGettingBetter
## 91 EconomyPoor NationalEconomyGettingBetter
## 92 CongressionalApproval NationalEconomyGettingWorse
## 93 CongressionalDisapproval NationalEconomyGettingWorse
## 94 GOPApproval NationalEconomyGettingWorse
## 95 IndependentPartyApproval NationalEconomyGettingWorse
## 96 DemocraticPartyIdentification NationalEconomyGettingWorse
## 97 GoodTimeFindQualityJob NationalEconomyGettingWorse
## 98 BadTimeFindQualityJob NationalEconomyGettingWorse
## 99 PresidentialApproval NationalEconomyGettingWorse
## 100 EconIssuesMostImpProblem NationalEconomyGettingWorse
## 101 EconomyExcellent NationalEconomyGettingWorse
## 102 EconomyGood NationalEconomyGettingWorse
## 103 EconomyFair NationalEconomyGettingWorse
## 104 EconomyPoor NationalEconomyGettingWorse
## 105 NationalEconomyGettingBetter NationalEconomyGettingWorse
## 106 CongressionalApproval EconomicConfidenceIndex
## 107 CongressionalDisapproval EconomicConfidenceIndex
## 108 GOPApproval EconomicConfidenceIndex
## 109 IndependentPartyApproval EconomicConfidenceIndex
## 110 DemocraticPartyIdentification EconomicConfidenceIndex
## 111 GoodTimeFindQualityJob EconomicConfidenceIndex
## 112 BadTimeFindQualityJob EconomicConfidenceIndex
## 113 PresidentialApproval EconomicConfidenceIndex
## 114 EconIssuesMostImpProblem EconomicConfidenceIndex
## 115 EconomyExcellent EconomicConfidenceIndex
## 116 EconomyGood EconomicConfidenceIndex
## 117 EconomyFair EconomicConfidenceIndex
## 118 EconomyPoor EconomicConfidenceIndex
## 119 NationalEconomyGettingBetter EconomicConfidenceIndex
## 120 NationalEconomyGettingWorse EconomicConfidenceIndex
## cor p
## 1 -0.98755223 0.000000e+00
## 2 0.58550298 2.220446e-16
## 3 -0.56007004 7.549517e-15
## 4 -0.54642725 4.529710e-14
## 5 0.56430292 4.440892e-15
## 6 -0.13592984 8.360889e-02
## 7 0.34432718 6.774719e-06
## 8 -0.33738780 1.061928e-05
## 9 0.60066056 0.000000e+00
## 10 0.07082908 3.689431e-01
## 11 0.18693204 1.687949e-02
## 12 -0.18951865 1.539328e-02
## 13 0.19385727 1.315576e-02
## 14 -0.10914844 1.654610e-01
## 15 0.02684049 7.337808e-01
## 16 -0.18546768 1.777501e-02
## 17 0.18999811 1.513070e-02
## 18 -0.17974387 2.168236e-02
## 19 0.11957784 1.284157e-01
## 20 -0.03971621 6.147119e-01
## 21 -0.99634165 0.000000e+00
## 22 0.22258599 4.292395e-03
## 23 -0.17467946 2.573616e-02
## 24 0.07124973 3.660991e-01
## 25 -0.19479504 1.271141e-02
## 26 -0.02656408 7.364208e-01
## 27 -0.32575768 2.202381e-05
## 28 0.34725454 5.586459e-06
## 29 -0.27438745 3.934164e-04
## 30 0.26321083 6.871645e-04
## 31 -0.33824196 1.005354e-05
## 32 0.12851901 1.020531e-01
## 33 -0.03842990 6.262270e-01
## 34 -0.87116969 0.000000e+00
## 35 0.85677624 0.000000e+00
## 36 0.28078634 2.827989e-04
## 37 0.30319369 8.349522e-05
## 38 -0.31220561 4.966781e-05
## 39 0.28591835 2.157619e-04
## 40 -0.20641293 8.204077e-03
## 41 0.20025048 1.037823e-02
## 42 0.68063915 0.000000e+00
## 43 -0.68506318 0.000000e+00
## 44 -0.38563430 3.703808e-07
## 45 -0.76395047 0.000000e+00
## 46 0.56509131 3.996803e-15
## 47 -0.55745113 1.088019e-14
## 48 0.53546351 1.778577e-13
## 49 -0.34615344 6.008127e-06
## 50 0.13251662 9.174116e-02
## 51 0.79419756 0.000000e+00
## 52 -0.78634655 0.000000e+00
## 53 -0.19438310 1.290495e-02
## 54 -0.86535108 0.000000e+00
## 55 0.77358299 0.000000e+00
## 56 -0.16416907 3.625379e-02
## 57 0.18018225 2.135890e-02
## 58 0.05031211 5.236021e-01
## 59 0.12156745 1.221374e-01
## 60 -0.32990655 1.703537e-05
## 61 0.30673668 6.820951e-05
## 62 -0.28007206 2.935308e-04
## 63 0.01218046 8.773585e-01
## 64 -0.41814154 2.792352e-08
## 65 0.05637205 4.747698e-01
## 66 0.26472679 6.379910e-04
## 67 -0.42211813 1.996437e-08
## 68 0.41435677 3.827491e-08
## 69 -0.46309128 4.833394e-10
## 70 0.26416707 6.557562e-04
## 71 -0.03046459 6.994654e-01
## 72 -0.79958552 0.000000e+00
## 73 0.78620100 0.000000e+00
## 74 0.20409261 8.970099e-03
## 75 0.90034884 0.000000e+00
## 76 -0.75651586 0.000000e+00
## 77 -0.95670152 0.000000e+00
## 78 -0.51319015 2.478906e-12
## 79 0.16823968 3.181431e-02
## 80 -0.12039809 1.257979e-01
## 81 0.15808661 4.385502e-02
## 82 -0.15463102 4.873828e-02
## 83 -0.22954443 3.203189e-03
## 84 0.09783312 2.140817e-01
## 85 -0.06698129 3.955846e-01
## 86 0.52043772 1.074252e-12
## 87 -0.26714334 5.662657e-04
## 88 0.09074272 2.493260e-01
## 89 0.30320123 8.345952e-05
## 90 0.51364684 2.353229e-12
## 91 -0.40204749 1.039372e-07
## 92 -0.29669252 1.201975e-04
## 93 0.25147718 1.203130e-03
## 94 -0.26916093 5.121617e-04
## 95 0.23454432 2.581949e-03
## 96 0.15830640 4.355878e-02
## 97 -0.16194813 3.888920e-02
## 98 0.13182797 9.345588e-02
## 99 -0.48030654 8.686030e-11
## 100 0.35685092 2.928352e-06
## 101 -0.20934449 7.319513e-03
## 102 -0.42942801 1.065208e-08
## 103 -0.50058699 1.013412e-11
## 104 0.51290995 2.559286e-12
## 105 -0.97982675 0.000000e+00
## 106 0.42990467 1.021924e-08
## 107 -0.39839372 1.387460e-07
## 108 0.42258760 1.918349e-08
## 109 -0.29717180 1.170463e-04
## 110 -0.06126272 4.372410e-01
## 111 0.56908137 2.220446e-15
## 112 -0.54517692 5.284662e-14
## 113 0.16162841 3.928154e-02
## 114 -0.73012298 0.000000e+00
## 115 0.57749361 6.661338e-16
## 116 0.81390089 0.000000e+00
## 117 0.52972317 3.572698e-13
## 118 -0.86882377 0.000000e+00
## 119 0.79125065 0.000000e+00
## 120 -0.86457312 0.000000e+00
We can visualize the correlation matrix using a correlogram. The package corrplot is required. The function corrplot() takes the correlation matrix as the first argument. The second argument (type=“upper”) is used to display only the upper triangular of the correlation matrix.
Note that positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients. In the right side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors.
library("corrplot")
corrplot(cor.mat, type="upper", order="hclust",
tl.col="black", tl.srt=45)
We can make a scatter plot matrix showing the correlation coefficients between variables and their significance levels. The package PerformanceAnalytics is required.
library("PerformanceAnalytics")
chart.Correlation(gallup_vars.active[, 1:6], histogram=TRUE, pch=19)
In the above plot:
The function PCA() in the FactoMiner package will be used to conduct our PCA study. A simplified format is:
PCA(X, scale.unit = TRUE, ncp = 5, graph = TRUE)
# Perform the PCA
library(FactoMineR)
res.pca <- PCA(gallup_vars.active, graph = FALSE)
The output of the function PCA() is a list including:
print(res.pca)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 163 individuals, described by 16 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
The object that is created using the function PCA() contains much information found in many different lists and matrices. These values are described in the next section.
The proportion of variances retained by the principal components can be extracted as follows:
eigenvalues <- res.pca$eig
head(eigenvalues[, 1:2])
## eigenvalue percentage of variance
## comp 1 7.0182722 43.864201
## comp 2 2.8940054 18.087534
## comp 3 2.7279418 17.049637
## comp 4 1.2243686 7.652304
## comp 5 0.6023628 3.764767
## comp 6 0.5353847 3.346155
We can visualize this by creating a scree plot using base R graphics:
barplot(eigenvalues[, 2], names.arg=1:nrow(eigenvalues),
main = "Variances",
xlab = "Principal Components",
ylab = "Percentage of variances",
col ="steelblue")
# Add connected line segments to the plot
lines(x = 1:nrow(eigenvalues), eigenvalues[, 2],
type="b", pch=19, col = "red")
We can make the the scree plot using the package factoextra:
fviz_screeplot(res.pca, ncp=10)
The function plot.PCA() can be used. A simplified format is:
plot.PCA(x, axes = c(1,2), choix = c(“ind”, “var”))
PCA(gallup_vars.active, scale.unit=TRUE, ncp=5, ind.sup=NULL,
quanti.sup=NULL, quali.sup=NULL, graph=TRUE, axes = c(1,2))
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 163 individuals, described by 16 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
res <- PCA(gallup_vars.active, graph = FALSE)
This will get the coordinates of variables on the principal components:
head(res.pca$var$coord)
## Dim.1 Dim.2 Dim.3
## CongressionalApproval 0.5569729 0.36359255 0.6708696
## CongressionalDisapproval -0.5426159 -0.32350952 -0.6891506
## GOPApproval 0.5231316 0.18698744 0.4914883
## IndependentPartyApproval -0.3537076 -0.32414902 -0.3486855
## DemocraticPartyIdentification 0.1117742 -0.08700543 0.6718603
## GoodTimeFindQualityJob 0.7898765 -0.48119899 -0.1360081
## Dim.4 Dim.5
## CongressionalApproval -0.13978824 -0.13503294
## CongressionalDisapproval 0.17709650 0.15867504
## GOPApproval 0.53262682 -0.21361578
## IndependentPartyApproval 0.66510093 0.05587073
## DemocraticPartyIdentification 0.60724381 0.20051545
## GoodTimeFindQualityJob -0.07762488 0.12697547
The quality of representation of the variables of the principal components are called the cos2.
head(res.pca$var$cos2)
## Dim.1 Dim.2 Dim.3
## CongressionalApproval 0.31021886 0.132199539 0.45006605
## CongressionalDisapproval 0.29443205 0.104658411 0.47492860
## GOPApproval 0.27366670 0.034964304 0.24156077
## IndependentPartyApproval 0.12510904 0.105072586 0.12158156
## DemocraticPartyIdentification 0.01249348 0.007569945 0.45139623
## GoodTimeFindQualityJob 0.62390488 0.231552467 0.01849821
## Dim.4 Dim.5
## CongressionalApproval 0.019540752 0.018233895
## CongressionalDisapproval 0.031363169 0.025177769
## GOPApproval 0.283691325 0.045631701
## IndependentPartyApproval 0.442359243 0.003121538
## DemocraticPartyIdentification 0.368745050 0.040206446
## GoodTimeFindQualityJob 0.006025622 0.016122770
Variable contributions in the determination of a given principal component are (in percentage):
(var.cos2 * 100) / (total cos2 of the component)
head(res.pca$var$contrib)
## Dim.1 Dim.2 Dim.3 Dim.4
## CongressionalApproval 4.4201601 4.5680474 16.4983740 1.5959860
## CongressionalDisapproval 4.1952213 3.6163862 17.4097773 2.5615790
## GOPApproval 3.8993458 1.2081631 8.8550558 23.1704181
## IndependentPartyApproval 1.7826188 3.6306977 4.4568971 36.1295806
## DemocraticPartyIdentification 0.1780136 0.2615733 16.5471355 30.1171597
## GoodTimeFindQualityJob 8.8897220 8.0011070 0.6781013 0.4921411
## Dim.5
## CongressionalApproval 3.0270623
## CongressionalDisapproval 4.1798350
## GOPApproval 7.5754520
## IndependentPartyApproval 0.5182157
## DemocraticPartyIdentification 6.6747896
## GoodTimeFindQualityJob 2.6765882
plot(res.pca, choix = "var")
The function fviz_pca_var() is used to visualize the variables:
# Default plot
fviz_pca_var(res.pca)
# Change color and theme
fviz_pca_var(res.pca, col.var="steelblue")+
theme_minimal()
# Control variable colors using their contribution
# Possible values for the argument col.var are:
# "cos2", "contrib", "coord", "x", "y"
fviz_pca_var(res.pca, col.var="contrib")
# Change the gradient color
fviz_pca_var(res.pca, col.var="contrib") +
scale_color_gradient2(low="white", mid="blue",
high="red", midpoint=55) +
theme_bw()
# Control the transparency of variables using their contribution
# Possible values for the argument alpha.var are:
# "cos2", "contrib", "coord", "x", "y"
fviz_pca_var(res.pca, alpha.var="contrib")+
theme_minimal()
fviz_pca_biplot(res.pca, geom = "text")
The function dimdesc() can be used to identify the most correlated variables with a given principal component.
A simplified format is :
dimdesc(res, axes = 1:3, proba = 0.05)
Here is an example of usage:
res.desc <- dimdesc(res.pca, axes = c(1,2))
# Description of dimension 1
res.desc$Dim.1
## $quanti
## correlation p.value
## EconomyGood 0.9641906 1.026975e-94
## EconomicConfidenceIndex 0.9036939 3.411436e-61
## GoodTimeFindQualityJob 0.7898765 5.130578e-36
## EconomyExcellent 0.7557959 2.049964e-31
## CongressionalApproval 0.5569729 1.154894e-14
## GOPApproval 0.5231316 7.832102e-13
## NationalEconomyGettingBetter 0.4724129 1.930925e-10
## EconomyFair 0.3983387 1.393471e-07
## IndependentPartyApproval -0.3537076 3.626766e-06
## CongressionalDisapproval -0.5426159 7.320973e-14
## NationalEconomyGettingWorse -0.5867922 1.852951e-16
## BadTimeFindQualityJob -0.7762424 4.447971e-34
## EconIssuesMostImpProblem -0.8877230 4.076556e-56
## EconomyPoor -0.9687478 2.146154e-99
# Description of dimension 2
res.desc$Dim.2
## $quanti
## correlation p.value
## PresidentialApproval 0.8307182 8.343893e-43
## NationalEconomyGettingBetter 0.6969911 4.991305e-25
## BadTimeFindQualityJob 0.5039352 7.010668e-12
## CongressionalApproval 0.3635925 1.836611e-06
## EconIssuesMostImpProblem 0.3159317 3.987068e-05
## EconomicConfidenceIndex 0.3142492 4.404528e-05
## GOPApproval 0.1869874 1.684640e-02
## CongressionalDisapproval -0.3235095 2.527312e-05
## IndependentPartyApproval -0.3241490 2.430559e-05
## EconomyExcellent -0.3780998 6.488223e-07
## GoodTimeFindQualityJob -0.4811990 7.925669e-11
## NationalEconomyGettingWorse -0.6801663 1.776039e-23
The package FactoInvestigate generates an automatic report with interpretation of FactoMineR-based principal component analyses. The main function provided by the package is the function Investigate(), which can be used to create either a Word, PDF or a HTML report.
The report includes the following items:
library(FactoInvestigate)
#Investigate(res,document="pdf_document",
# parallel = TRUE)