Introduction

This study performs a Principle Components Analysis on a collection of sixteen Gallup polling variables to discover any patterns in the data and find smaller groups of homogeneous items from a larger group of data.

Principal component analysis (PCA) allows us to summarize the variations (information) in a data set described by multiple variables. Each variable could be considered as a different dimension. If you have more than three variables in your data set, it could be very difficult to visualize a multi-dimensional hyperspace.

The goal of PCA is to transform the initial variables into a new set of variables which explain the variation in the data. These new variables corresponds to a linear combination of the originals and are called principal components.

PCA reduces the dimensionality of multivariate data, to two or three that can be visualized graphically with minimal loss of information.

Several functions from different packages are available in R for performing PCA, e.g., prcomp and princomp (built-in R stats package) and PCA in the FactoMineR package.

This study will demonstrate how to perform a principal component analysis using R software and the FactoMineR package. It will also show how to visualize the output of the PCA using the R package factoextra.

Data Understanding

The Gallup polling variables represent monthly public opinion and attitude polling data sent from the Gallup Organiztion, e.g., Economic Confidence, Presidential Approval, Economy Good, etc. The data contains polling data taken from a date range of January 2004 through July 2017.

The PCA study will be conducted using FactoMineR (Husson et al.),one of the most powerful R packages and a good choice for performing a multivariate exploratory data analysis.

FactoMineR can be installed and loaded as follows:

setwd("C:/Users/Michael/Desktop/R Code and Data/data/gallup")

# ipak function: install and load multiple R packages.
# check to see if packages are installed. Install them if they are not, 
# then load them into the R session.

ipak <- function(pkg){
        new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
        if (length(new.pkg)) 
                install.packages(new.pkg, dependencies = TRUE)
        sapply(pkg, require, character.only = TRUE)
}

# list of packages required for this analysis
packages <- c("FactoMineR",       ##
              "FactoInvestigate", ##
              "missMDA",          ##
              "corrplot",
              "PerformanceAnalytics",
              "readr",      ## ** File processing ** 
              "plyr",       ## ** Data wrangling **
              "dplyr",      ## ** Data wrangling **
              "reshape2",   ## ** Data wrangling **
              "knitr",      ## ** Reproducibility ** 
              "devtools",   ## ** R package tools **
              "DT",         ## ** DataTables **
              "xtable",     ## ** Export tables to HTML **
              "pander",     ## ** Pandoc writer for R **
              "Hmisc",      ## ** Data cleansing (imputation) **
              "ggplot2")    ## ** Plotting and Visualization **

ipak(packages)
##           FactoMineR     FactoInvestigate              missMDA 
##                 TRUE                 TRUE                 TRUE 
##             corrplot PerformanceAnalytics                readr 
##                 TRUE                 TRUE                 TRUE 
##                 plyr                dplyr             reshape2 
##                 TRUE                 TRUE                 TRUE 
##                knitr             devtools                   DT 
##                 TRUE                 TRUE                 TRUE 
##               xtable               pander                Hmisc 
##                 TRUE                 TRUE                 TRUE 
##              ggplot2 
##                 TRUE
# Load it
library(FactoMineR)

The package factoextra has flexible methods for the classes PCA, prcomp, princomp, and dudi in order to quickly extract and visualize the results of the analysis. The ggplot2 plotting system is used for the data visualization.

Install and load package factoextra as follows:

# Install and load factoextra package
library("devtools")
install_github("kassambara/factoextra")
# Load it
library("factoextra")

Data Preparation

# read csv flat file containing summarize Gallup polling data (1/2004-7/2017):
gallup_vars <- 
        read_csv("gallup_all_vars.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
##   EffDate = col_character(),
##   CongressionalApproval = col_double(),
##   CongressionalDisapproval = col_double(),
##   GOPApproval = col_double(),
##   IndependentPartyApproval = col_double(),
##   DemocraticPartyIdentification = col_double(),
##   GoodTimeFindQualityJob = col_double(),
##   BadTimeFindQualityJob = col_double(),
##   PresidentialApproval = col_double(),
##   EconIssuesMostImpProblem = col_double(),
##   EconomyExcellent = col_double(),
##   EconomyGood = col_double(),
##   EconomyFair = col_double(),
##   EconomyPoor = col_double(),
##   NationalEconomyGettingBetter = col_double(),
##   NationalEconomyGettingWorse = col_double(),
##   EconomicConfidenceIndex = col_integer()
## )
# examine first few rows of data
head(gallup_vars[, 1:6])
## # A tibble: 6 x 6
##    EffDate CongressionalApproval CongressionalDisapproval GOPApproval
##      <chr>                 <dbl>                    <dbl>       <dbl>
## 1 1/1/2004                  0.48                     0.45        0.32
## 2 2/1/2004                  0.41                     0.51        0.31
## 3 3/1/2004                  0.42                     0.51        0.33
## 4 4/1/2004                  0.43                     0.51        0.33
## 5 5/1/2004                  0.41                     0.52        0.34
## 6 6/1/2004                  0.41                     0.52        0.32
## # ... with 2 more variables: IndependentPartyApproval <dbl>,
## #   DemocraticPartyIdentification <dbl>
# examine data stats and look for missing values
summary(gallup_vars)
##    EffDate          CongressionalApproval CongressionalDisapproval
##  Length:163         Min.   :0.0900        Min.   :0.4500          
##  Class :character   1st Qu.:0.1600        1st Qu.:0.6500          
##  Mode  :character   Median :0.2000        Median :0.7400          
##                     Mean   :0.2233        Mean   :0.7158          
##                     3rd Qu.:0.2700        3rd Qu.:0.7950          
##                     Max.   :0.4800        Max.   :0.8600          
##   GOPApproval     IndependentPartyApproval DemocraticPartyIdentification
##  Min.   :0.0000   Min.   :0.0000           Min.   :0.0000               
##  1st Qu.:0.2638   1st Qu.:0.3500           1st Qu.:0.3020               
##  Median :0.2730   Median :0.3750           Median :0.3150               
##  Mean   :0.2811   Mean   :0.3679           Mean   :0.3185               
##  3rd Qu.:0.2890   3rd Qu.:0.3970           3rd Qu.:0.3400               
##  Max.   :0.3800   Max.   :0.4350           Max.   :0.3850               
##  GoodTimeFindQualityJob BadTimeFindQualityJob PresidentialApproval
##  Min.   :0.0800         Min.   :0.3600        Min.   :0.2500      
##  1st Qu.:0.1950         1st Qu.:0.5500        1st Qu.:0.4100      
##  Median :0.3400         Median :0.6100        Median :0.4600      
##  Mean   :0.3028         Mean   :0.6596        Mean   :0.4472      
##  3rd Qu.:0.4100         3rd Qu.:0.7650        3rd Qu.:0.4943      
##  Max.   :0.5800         Max.   :0.9000        Max.   :0.6600      
##  EconIssuesMostImpProblem EconomyExcellent   EconomyGood    
##  Min.   :0.0800           Min.   :0.01000   Min.   :0.0600  
##  1st Qu.:0.3100           1st Qu.:0.01000   1st Qu.:0.1250  
##  Median :0.4100           Median :0.02000   Median :0.2000  
##  Mean   :0.4637           Mean   :0.02693   Mean   :0.2067  
##  3rd Qu.:0.6400           3rd Qu.:0.03000   3rd Qu.:0.2800  
##  Max.   :0.8600           Max.   :0.11000   Max.   :0.4100  
##   EconomyFair      EconomyPoor     NationalEconomyGettingBetter
##  Min.   :0.3000   Min.   :0.1500   Min.   :0.0900              
##  1st Qu.:0.4200   1st Qu.:0.2300   1st Qu.:0.3000              
##  Median :0.4400   Median :0.3100   Median :0.3800              
##  Mean   :0.4312   Mean   :0.3317   Mean   :0.3544              
##  3rd Qu.:0.4500   3rd Qu.:0.4250   3rd Qu.:0.4200              
##  Max.   :0.4900   Max.   :0.6400   Max.   :0.6600              
##  NationalEconomyGettingWorse EconomicConfidenceIndex
##  Min.   :0.2700              Min.   :-65.00         
##  1st Qu.:0.5200              1st Qu.:-27.00         
##  Median :0.5600              Median :-15.00         
##  Mean   :0.5823              Mean   :-16.44         
##  3rd Qu.:0.6200              3rd Qu.: -4.50         
##  Max.   :0.8800              Max.   : 33.00
# Extract only active rows (years) and variables for the PCA:
# Active years (rows 1:23 - years that are used during the PCA [2004-2017].
# Active variables (columns 2:17) - Sixteen Gallup variables used for the PCA.
colnames(gallup_vars)
##  [1] "EffDate"                       "CongressionalApproval"        
##  [3] "CongressionalDisapproval"      "GOPApproval"                  
##  [5] "IndependentPartyApproval"      "DemocraticPartyIdentification"
##  [7] "GoodTimeFindQualityJob"        "BadTimeFindQualityJob"        
##  [9] "PresidentialApproval"          "EconIssuesMostImpProblem"     
## [11] "EconomyExcellent"              "EconomyGood"                  
## [13] "EconomyFair"                   "EconomyPoor"                  
## [15] "NationalEconomyGettingBetter"  "NationalEconomyGettingWorse"  
## [17] "EconomicConfidenceIndex"
gallup_vars.active <- gallup_vars[, 2:17]
head(gallup_vars.active[, 1:6])
## # A tibble: 6 x 6
##   CongressionalApproval CongressionalDisapproval GOPApproval
##                   <dbl>                    <dbl>       <dbl>
## 1                  0.48                     0.45        0.32
## 2                  0.41                     0.51        0.31
## 3                  0.42                     0.51        0.33
## 4                  0.43                     0.51        0.33
## 5                  0.41                     0.52        0.34
## 6                  0.41                     0.52        0.32
## # ... with 3 more variables: IndependentPartyApproval <dbl>,
## #   DemocraticPartyIdentification <dbl>, GoodTimeFindQualityJob <dbl>

Exploratory Data Analysis

Before performing PCA, we perform some exploratory data analysis such as descriptive statistics, correlation matrix and scatter plot matrix.

Perform Descriptive Statistics

gallup_vars.active_stats <- data.frame(
        Min = apply(gallup_vars.active, 2, min), # minimum
        Q1 = apply(gallup_vars.active, 2, quantile, 1/4), # First quartile
        Med = apply(gallup_vars.active, 2, median), # median
        Mean = apply(gallup_vars.active, 2, mean), # mean
        Q3 = apply(gallup_vars.active, 2, quantile, 3/4), # Third quartile
        Max = apply(gallup_vars.active, 2, max) # Maximum
)

gallup_vars.active_stats <- round(gallup_vars.active_stats, 1)
head(gallup_vars.active_stats)
##                               Min  Q1 Med Mean  Q3 Max
## CongressionalApproval         0.1 0.2 0.2  0.2 0.3 0.5
## CongressionalDisapproval      0.4 0.6 0.7  0.7 0.8 0.9
## GOPApproval                   0.0 0.3 0.3  0.3 0.3 0.4
## IndependentPartyApproval      0.0 0.4 0.4  0.4 0.4 0.4
## DemocraticPartyIdentification 0.0 0.3 0.3  0.3 0.3 0.4
## GoodTimeFindQualityJob        0.1 0.2 0.3  0.3 0.4 0.6

Note that, you can also use the built-in R function summary() for the descriptive statistics but the format of the output on a data frame can be hard to read.

Correlation Matrix

The correlation between variables can be calculated as follows:

cor.mat <- round(cor(gallup_vars.active, method="pearson"),2)
head(cor.mat[, 1:6])
##                               CongressionalApproval
## CongressionalApproval                          1.00
## CongressionalDisapproval                      -0.99
## GOPApproval                                    0.59
## IndependentPartyApproval                      -0.55
## DemocraticPartyIdentification                  0.34
## GoodTimeFindQualityJob                         0.19
##                               CongressionalDisapproval GOPApproval
## CongressionalApproval                            -0.99        0.59
## CongressionalDisapproval                          1.00       -0.56
## GOPApproval                                      -0.56        1.00
## IndependentPartyApproval                          0.56       -0.14
## DemocraticPartyIdentification                    -0.34        0.60
## GoodTimeFindQualityJob                           -0.19        0.19
##                               IndependentPartyApproval
## CongressionalApproval                            -0.55
## CongressionalDisapproval                          0.56
## GOPApproval                                      -0.14
## IndependentPartyApproval                          1.00
## DemocraticPartyIdentification                     0.07
## GoodTimeFindQualityJob                           -0.11
##                               DemocraticPartyIdentification
## CongressionalApproval                                  0.34
## CongressionalDisapproval                              -0.34
## GOPApproval                                            0.60
## IndependentPartyApproval                               0.07
## DemocraticPartyIdentification                          1.00
## GoodTimeFindQualityJob                                 0.03
##                               GoodTimeFindQualityJob
## CongressionalApproval                           0.19
## CongressionalDisapproval                       -0.19
## GOPApproval                                     0.19
## IndependentPartyApproval                       -0.11
## DemocraticPartyIdentification                   0.03
## GoodTimeFindQualityJob                          1.00

Correlation Matrix with Significance Levels (p-value)

The function rcorr() in the Hmisc package can be used to compute the significance levels for Pearson and Spearman correlations. It returns both the correlation coefficients and the p-value of the correlation for all possible pairs of columns in the data table.

library(Hmisc)

rcorr.mat <- rcorr(as.matrix(gallup_vars.active))
head(rcorr.mat,1)
## $r
##                               CongressionalApproval
## CongressionalApproval                     1.0000000
## CongressionalDisapproval                 -0.9875522
## GOPApproval                               0.5855030
## IndependentPartyApproval                 -0.5464272
## DemocraticPartyIdentification             0.3443272
## GoodTimeFindQualityJob                    0.1869320
## BadTimeFindQualityJob                    -0.1854677
## PresidentialApproval                      0.2225860
## EconIssuesMostImpProblem                 -0.2743874
## EconomyExcellent                          0.3031937
## EconomyGood                               0.5650913
## EconomyFair                              -0.1641691
## EconomyPoor                              -0.4221181
## NationalEconomyGettingBetter              0.1682397
## NationalEconomyGettingWorse              -0.2966925
## EconomicConfidenceIndex                   0.4299047
##                               CongressionalDisapproval GOPApproval
## CongressionalApproval                       -0.9875522  0.58550298
## CongressionalDisapproval                     1.0000000 -0.56007004
## GOPApproval                                 -0.5600700  1.00000000
## IndependentPartyApproval                     0.5643029 -0.13592984
## DemocraticPartyIdentification               -0.3373878  0.60066056
## GoodTimeFindQualityJob                      -0.1895186  0.19385727
## BadTimeFindQualityJob                        0.1899981 -0.17974387
## PresidentialApproval                        -0.1746795  0.07124973
## EconIssuesMostImpProblem                     0.2632108 -0.33824196
## EconomyExcellent                            -0.3122056  0.28591835
## EconomyGood                                 -0.5574511  0.53546351
## EconomyFair                                  0.1801822  0.05031211
## EconomyPoor                                  0.4143568 -0.46309128
## NationalEconomyGettingBetter                -0.1203981  0.15808661
## NationalEconomyGettingWorse                  0.2514772 -0.26916093
## EconomicConfidenceIndex                     -0.3983937  0.42258760
##                               IndependentPartyApproval
## CongressionalApproval                      -0.54642725
## CongressionalDisapproval                    0.56430292
## GOPApproval                                -0.13592984
## IndependentPartyApproval                    1.00000000
## DemocraticPartyIdentification               0.07082908
## GoodTimeFindQualityJob                     -0.10914844
## BadTimeFindQualityJob                       0.11957784
## PresidentialApproval                       -0.19479504
## EconIssuesMostImpProblem                    0.12851901
## EconomyExcellent                           -0.20641293
## EconomyGood                                -0.34615344
## EconomyFair                                 0.12156745
## EconomyPoor                                 0.26416707
## NationalEconomyGettingBetter               -0.15463102
## NationalEconomyGettingWorse                 0.23454432
## EconomicConfidenceIndex                    -0.29717180
##                               DemocraticPartyIdentification
## CongressionalApproval                            0.34432718
## CongressionalDisapproval                        -0.33738780
## GOPApproval                                      0.60066056
## IndependentPartyApproval                         0.07082908
## DemocraticPartyIdentification                    1.00000000
## GoodTimeFindQualityJob                           0.02684049
## BadTimeFindQualityJob                           -0.03971621
## PresidentialApproval                            -0.02656408
## EconIssuesMostImpProblem                        -0.03842990
## EconomyExcellent                                 0.20025048
## EconomyGood                                      0.13251662
## EconomyFair                                     -0.32990655
## EconomyPoor                                     -0.03046459
## NationalEconomyGettingBetter                    -0.22954443
## NationalEconomyGettingWorse                      0.15830640
## EconomicConfidenceIndex                         -0.06126272
##                               GoodTimeFindQualityJob BadTimeFindQualityJob
## CongressionalApproval                     0.18693204           -0.18546768
## CongressionalDisapproval                 -0.18951865            0.18999811
## GOPApproval                               0.19385727           -0.17974387
## IndependentPartyApproval                 -0.10914844            0.11957784
## DemocraticPartyIdentification             0.02684049           -0.03971621
## GoodTimeFindQualityJob                    1.00000000           -0.99634165
## BadTimeFindQualityJob                    -0.99634165            1.00000000
## PresidentialApproval                     -0.32575768            0.34725454
## EconIssuesMostImpProblem                 -0.87116969            0.85677624
## EconomyExcellent                          0.68063915           -0.68506318
## EconomyGood                               0.79419756           -0.78634655
## EconomyFair                               0.30673668           -0.28007206
## EconomyPoor                              -0.79958552            0.78620100
## NationalEconomyGettingBetter              0.09783312           -0.06698129
## NationalEconomyGettingWorse              -0.16194813            0.13182797
## EconomicConfidenceIndex                   0.56908137           -0.54517692
##                               PresidentialApproval
## CongressionalApproval                   0.22258599
## CongressionalDisapproval               -0.17467946
## GOPApproval                             0.07124973
## IndependentPartyApproval               -0.19479504
## DemocraticPartyIdentification          -0.02656408
## GoodTimeFindQualityJob                 -0.32575768
## BadTimeFindQualityJob                   0.34725454
## PresidentialApproval                    1.00000000
## EconIssuesMostImpProblem                0.28078634
## EconomyExcellent                       -0.38563430
## EconomyGood                            -0.19438310
## EconomyFair                             0.01218046
## EconomyPoor                             0.20409261
## NationalEconomyGettingBetter            0.52043772
## NationalEconomyGettingWorse            -0.48030654
## EconomicConfidenceIndex                 0.16162841
##                               EconIssuesMostImpProblem EconomyExcellent
## CongressionalApproval                       -0.2743874       0.30319369
## CongressionalDisapproval                     0.2632108      -0.31220561
## GOPApproval                                 -0.3382420       0.28591835
## IndependentPartyApproval                     0.1285190      -0.20641293
## DemocraticPartyIdentification               -0.0384299       0.20025048
## GoodTimeFindQualityJob                      -0.8711697       0.68063915
## BadTimeFindQualityJob                        0.8567762      -0.68506318
## PresidentialApproval                         0.2807863      -0.38563430
## EconIssuesMostImpProblem                     1.0000000      -0.76395047
## EconomyExcellent                            -0.7639505       1.00000000
## EconomyGood                                 -0.8653511       0.77358299
## EconomyFair                                 -0.4181415       0.05637205
## EconomyPoor                                  0.9003488      -0.75651586
## NationalEconomyGettingBetter                -0.2671433       0.09074272
## NationalEconomyGettingWorse                  0.3568509      -0.20934449
## EconomicConfidenceIndex                     -0.7301230       0.57749361
##                               EconomyGood EconomyFair EconomyPoor
## CongressionalApproval           0.5650913 -0.16416907 -0.42211813
## CongressionalDisapproval       -0.5574511  0.18018225  0.41435677
## GOPApproval                     0.5354635  0.05031211 -0.46309128
## IndependentPartyApproval       -0.3461534  0.12156745  0.26416707
## DemocraticPartyIdentification   0.1325166 -0.32990655 -0.03046459
## GoodTimeFindQualityJob          0.7941976  0.30673668 -0.79958552
## BadTimeFindQualityJob          -0.7863466 -0.28007206  0.78620100
## PresidentialApproval           -0.1943831  0.01218046  0.20409261
## EconIssuesMostImpProblem       -0.8653511 -0.41814154  0.90034884
## EconomyExcellent                0.7735830  0.05637205 -0.75651586
## EconomyGood                     1.0000000  0.26472679 -0.95670152
## EconomyFair                     0.2647268  1.00000000 -0.51319015
## EconomyPoor                    -0.9567015 -0.51319015  1.00000000
## NationalEconomyGettingBetter    0.3032012  0.51364684 -0.40204749
## NationalEconomyGettingWorse    -0.4294280 -0.50058699  0.51290995
## EconomicConfidenceIndex         0.8139009  0.52972317 -0.86882377
##                               NationalEconomyGettingBetter
## CongressionalApproval                           0.16823968
## CongressionalDisapproval                       -0.12039809
## GOPApproval                                     0.15808661
## IndependentPartyApproval                       -0.15463102
## DemocraticPartyIdentification                  -0.22954443
## GoodTimeFindQualityJob                          0.09783312
## BadTimeFindQualityJob                          -0.06698129
## PresidentialApproval                            0.52043772
## EconIssuesMostImpProblem                       -0.26714334
## EconomyExcellent                                0.09074272
## EconomyGood                                     0.30320123
## EconomyFair                                     0.51364684
## EconomyPoor                                    -0.40204749
## NationalEconomyGettingBetter                    1.00000000
## NationalEconomyGettingWorse                    -0.97982675
## EconomicConfidenceIndex                         0.79125065
##                               NationalEconomyGettingWorse
## CongressionalApproval                          -0.2966925
## CongressionalDisapproval                        0.2514772
## GOPApproval                                    -0.2691609
## IndependentPartyApproval                        0.2345443
## DemocraticPartyIdentification                   0.1583064
## GoodTimeFindQualityJob                         -0.1619481
## BadTimeFindQualityJob                           0.1318280
## PresidentialApproval                           -0.4803065
## EconIssuesMostImpProblem                        0.3568509
## EconomyExcellent                               -0.2093445
## EconomyGood                                    -0.4294280
## EconomyFair                                    -0.5005870
## EconomyPoor                                     0.5129099
## NationalEconomyGettingBetter                   -0.9798267
## NationalEconomyGettingWorse                     1.0000000
## EconomicConfidenceIndex                        -0.8645731
##                               EconomicConfidenceIndex
## CongressionalApproval                      0.42990467
## CongressionalDisapproval                  -0.39839372
## GOPApproval                                0.42258760
## IndependentPartyApproval                  -0.29717180
## DemocraticPartyIdentification             -0.06126272
## GoodTimeFindQualityJob                     0.56908137
## BadTimeFindQualityJob                     -0.54517692
## PresidentialApproval                       0.16162841
## EconIssuesMostImpProblem                  -0.73012298
## EconomyExcellent                           0.57749361
## EconomyGood                                0.81390089
## EconomyFair                                0.52972317
## EconomyPoor                               -0.86882377
## NationalEconomyGettingBetter               0.79125065
## NationalEconomyGettingWorse               -0.86457312
## EconomicConfidenceIndex                    1.00000000
# Extract the correlation coefficients
rcorr.mat$r
##                               CongressionalApproval
## CongressionalApproval                     1.0000000
## CongressionalDisapproval                 -0.9875522
## GOPApproval                               0.5855030
## IndependentPartyApproval                 -0.5464272
## DemocraticPartyIdentification             0.3443272
## GoodTimeFindQualityJob                    0.1869320
## BadTimeFindQualityJob                    -0.1854677
## PresidentialApproval                      0.2225860
## EconIssuesMostImpProblem                 -0.2743874
## EconomyExcellent                          0.3031937
## EconomyGood                               0.5650913
## EconomyFair                              -0.1641691
## EconomyPoor                              -0.4221181
## NationalEconomyGettingBetter              0.1682397
## NationalEconomyGettingWorse              -0.2966925
## EconomicConfidenceIndex                   0.4299047
##                               CongressionalDisapproval GOPApproval
## CongressionalApproval                       -0.9875522  0.58550298
## CongressionalDisapproval                     1.0000000 -0.56007004
## GOPApproval                                 -0.5600700  1.00000000
## IndependentPartyApproval                     0.5643029 -0.13592984
## DemocraticPartyIdentification               -0.3373878  0.60066056
## GoodTimeFindQualityJob                      -0.1895186  0.19385727
## BadTimeFindQualityJob                        0.1899981 -0.17974387
## PresidentialApproval                        -0.1746795  0.07124973
## EconIssuesMostImpProblem                     0.2632108 -0.33824196
## EconomyExcellent                            -0.3122056  0.28591835
## EconomyGood                                 -0.5574511  0.53546351
## EconomyFair                                  0.1801822  0.05031211
## EconomyPoor                                  0.4143568 -0.46309128
## NationalEconomyGettingBetter                -0.1203981  0.15808661
## NationalEconomyGettingWorse                  0.2514772 -0.26916093
## EconomicConfidenceIndex                     -0.3983937  0.42258760
##                               IndependentPartyApproval
## CongressionalApproval                      -0.54642725
## CongressionalDisapproval                    0.56430292
## GOPApproval                                -0.13592984
## IndependentPartyApproval                    1.00000000
## DemocraticPartyIdentification               0.07082908
## GoodTimeFindQualityJob                     -0.10914844
## BadTimeFindQualityJob                       0.11957784
## PresidentialApproval                       -0.19479504
## EconIssuesMostImpProblem                    0.12851901
## EconomyExcellent                           -0.20641293
## EconomyGood                                -0.34615344
## EconomyFair                                 0.12156745
## EconomyPoor                                 0.26416707
## NationalEconomyGettingBetter               -0.15463102
## NationalEconomyGettingWorse                 0.23454432
## EconomicConfidenceIndex                    -0.29717180
##                               DemocraticPartyIdentification
## CongressionalApproval                            0.34432718
## CongressionalDisapproval                        -0.33738780
## GOPApproval                                      0.60066056
## IndependentPartyApproval                         0.07082908
## DemocraticPartyIdentification                    1.00000000
## GoodTimeFindQualityJob                           0.02684049
## BadTimeFindQualityJob                           -0.03971621
## PresidentialApproval                            -0.02656408
## EconIssuesMostImpProblem                        -0.03842990
## EconomyExcellent                                 0.20025048
## EconomyGood                                      0.13251662
## EconomyFair                                     -0.32990655
## EconomyPoor                                     -0.03046459
## NationalEconomyGettingBetter                    -0.22954443
## NationalEconomyGettingWorse                      0.15830640
## EconomicConfidenceIndex                         -0.06126272
##                               GoodTimeFindQualityJob BadTimeFindQualityJob
## CongressionalApproval                     0.18693204           -0.18546768
## CongressionalDisapproval                 -0.18951865            0.18999811
## GOPApproval                               0.19385727           -0.17974387
## IndependentPartyApproval                 -0.10914844            0.11957784
## DemocraticPartyIdentification             0.02684049           -0.03971621
## GoodTimeFindQualityJob                    1.00000000           -0.99634165
## BadTimeFindQualityJob                    -0.99634165            1.00000000
## PresidentialApproval                     -0.32575768            0.34725454
## EconIssuesMostImpProblem                 -0.87116969            0.85677624
## EconomyExcellent                          0.68063915           -0.68506318
## EconomyGood                               0.79419756           -0.78634655
## EconomyFair                               0.30673668           -0.28007206
## EconomyPoor                              -0.79958552            0.78620100
## NationalEconomyGettingBetter              0.09783312           -0.06698129
## NationalEconomyGettingWorse              -0.16194813            0.13182797
## EconomicConfidenceIndex                   0.56908137           -0.54517692
##                               PresidentialApproval
## CongressionalApproval                   0.22258599
## CongressionalDisapproval               -0.17467946
## GOPApproval                             0.07124973
## IndependentPartyApproval               -0.19479504
## DemocraticPartyIdentification          -0.02656408
## GoodTimeFindQualityJob                 -0.32575768
## BadTimeFindQualityJob                   0.34725454
## PresidentialApproval                    1.00000000
## EconIssuesMostImpProblem                0.28078634
## EconomyExcellent                       -0.38563430
## EconomyGood                            -0.19438310
## EconomyFair                             0.01218046
## EconomyPoor                             0.20409261
## NationalEconomyGettingBetter            0.52043772
## NationalEconomyGettingWorse            -0.48030654
## EconomicConfidenceIndex                 0.16162841
##                               EconIssuesMostImpProblem EconomyExcellent
## CongressionalApproval                       -0.2743874       0.30319369
## CongressionalDisapproval                     0.2632108      -0.31220561
## GOPApproval                                 -0.3382420       0.28591835
## IndependentPartyApproval                     0.1285190      -0.20641293
## DemocraticPartyIdentification               -0.0384299       0.20025048
## GoodTimeFindQualityJob                      -0.8711697       0.68063915
## BadTimeFindQualityJob                        0.8567762      -0.68506318
## PresidentialApproval                         0.2807863      -0.38563430
## EconIssuesMostImpProblem                     1.0000000      -0.76395047
## EconomyExcellent                            -0.7639505       1.00000000
## EconomyGood                                 -0.8653511       0.77358299
## EconomyFair                                 -0.4181415       0.05637205
## EconomyPoor                                  0.9003488      -0.75651586
## NationalEconomyGettingBetter                -0.2671433       0.09074272
## NationalEconomyGettingWorse                  0.3568509      -0.20934449
## EconomicConfidenceIndex                     -0.7301230       0.57749361
##                               EconomyGood EconomyFair EconomyPoor
## CongressionalApproval           0.5650913 -0.16416907 -0.42211813
## CongressionalDisapproval       -0.5574511  0.18018225  0.41435677
## GOPApproval                     0.5354635  0.05031211 -0.46309128
## IndependentPartyApproval       -0.3461534  0.12156745  0.26416707
## DemocraticPartyIdentification   0.1325166 -0.32990655 -0.03046459
## GoodTimeFindQualityJob          0.7941976  0.30673668 -0.79958552
## BadTimeFindQualityJob          -0.7863466 -0.28007206  0.78620100
## PresidentialApproval           -0.1943831  0.01218046  0.20409261
## EconIssuesMostImpProblem       -0.8653511 -0.41814154  0.90034884
## EconomyExcellent                0.7735830  0.05637205 -0.75651586
## EconomyGood                     1.0000000  0.26472679 -0.95670152
## EconomyFair                     0.2647268  1.00000000 -0.51319015
## EconomyPoor                    -0.9567015 -0.51319015  1.00000000
## NationalEconomyGettingBetter    0.3032012  0.51364684 -0.40204749
## NationalEconomyGettingWorse    -0.4294280 -0.50058699  0.51290995
## EconomicConfidenceIndex         0.8139009  0.52972317 -0.86882377
##                               NationalEconomyGettingBetter
## CongressionalApproval                           0.16823968
## CongressionalDisapproval                       -0.12039809
## GOPApproval                                     0.15808661
## IndependentPartyApproval                       -0.15463102
## DemocraticPartyIdentification                  -0.22954443
## GoodTimeFindQualityJob                          0.09783312
## BadTimeFindQualityJob                          -0.06698129
## PresidentialApproval                            0.52043772
## EconIssuesMostImpProblem                       -0.26714334
## EconomyExcellent                                0.09074272
## EconomyGood                                     0.30320123
## EconomyFair                                     0.51364684
## EconomyPoor                                    -0.40204749
## NationalEconomyGettingBetter                    1.00000000
## NationalEconomyGettingWorse                    -0.97982675
## EconomicConfidenceIndex                         0.79125065
##                               NationalEconomyGettingWorse
## CongressionalApproval                          -0.2966925
## CongressionalDisapproval                        0.2514772
## GOPApproval                                    -0.2691609
## IndependentPartyApproval                        0.2345443
## DemocraticPartyIdentification                   0.1583064
## GoodTimeFindQualityJob                         -0.1619481
## BadTimeFindQualityJob                           0.1318280
## PresidentialApproval                           -0.4803065
## EconIssuesMostImpProblem                        0.3568509
## EconomyExcellent                               -0.2093445
## EconomyGood                                    -0.4294280
## EconomyFair                                    -0.5005870
## EconomyPoor                                     0.5129099
## NationalEconomyGettingBetter                   -0.9798267
## NationalEconomyGettingWorse                     1.0000000
## EconomicConfidenceIndex                        -0.8645731
##                               EconomicConfidenceIndex
## CongressionalApproval                      0.42990467
## CongressionalDisapproval                  -0.39839372
## GOPApproval                                0.42258760
## IndependentPartyApproval                  -0.29717180
## DemocraticPartyIdentification             -0.06126272
## GoodTimeFindQualityJob                     0.56908137
## BadTimeFindQualityJob                     -0.54517692
## PresidentialApproval                       0.16162841
## EconIssuesMostImpProblem                  -0.73012298
## EconomyExcellent                           0.57749361
## EconomyGood                                0.81390089
## EconomyFair                                0.52972317
## EconomyPoor                               -0.86882377
## NationalEconomyGettingBetter               0.79125065
## NationalEconomyGettingWorse               -0.86457312
## EconomicConfidenceIndex                    1.00000000
# Extract the p-values
rcorr.mat$P
##                               CongressionalApproval
## CongressionalApproval                            NA
## CongressionalDisapproval               0.000000e+00
## GOPApproval                            2.220446e-16
## IndependentPartyApproval               4.529710e-14
## DemocraticPartyIdentification          6.774719e-06
## GoodTimeFindQualityJob                 1.687949e-02
## BadTimeFindQualityJob                  1.777501e-02
## PresidentialApproval                   4.292395e-03
## EconIssuesMostImpProblem               3.934164e-04
## EconomyExcellent                       8.349522e-05
## EconomyGood                            3.996803e-15
## EconomyFair                            3.625379e-02
## EconomyPoor                            1.996437e-08
## NationalEconomyGettingBetter           3.181431e-02
## NationalEconomyGettingWorse            1.201975e-04
## EconomicConfidenceIndex                1.021924e-08
##                               CongressionalDisapproval  GOPApproval
## CongressionalApproval                     0.000000e+00 2.220446e-16
## CongressionalDisapproval                            NA 7.549517e-15
## GOPApproval                               7.549517e-15           NA
## IndependentPartyApproval                  4.440892e-15 8.360889e-02
## DemocraticPartyIdentification             1.061928e-05 0.000000e+00
## GoodTimeFindQualityJob                    1.539328e-02 1.315576e-02
## BadTimeFindQualityJob                     1.513070e-02 2.168236e-02
## PresidentialApproval                      2.573616e-02 3.660991e-01
## EconIssuesMostImpProblem                  6.871645e-04 1.005354e-05
## EconomyExcellent                          4.966781e-05 2.157619e-04
## EconomyGood                               1.088019e-14 1.778577e-13
## EconomyFair                               2.135890e-02 5.236021e-01
## EconomyPoor                               3.827491e-08 4.833394e-10
## NationalEconomyGettingBetter              1.257979e-01 4.385502e-02
## NationalEconomyGettingWorse               1.203130e-03 5.121617e-04
## EconomicConfidenceIndex                   1.387460e-07 1.918349e-08
##                               IndependentPartyApproval
## CongressionalApproval                     4.529710e-14
## CongressionalDisapproval                  4.440892e-15
## GOPApproval                               8.360889e-02
## IndependentPartyApproval                            NA
## DemocraticPartyIdentification             3.689431e-01
## GoodTimeFindQualityJob                    1.654610e-01
## BadTimeFindQualityJob                     1.284157e-01
## PresidentialApproval                      1.271141e-02
## EconIssuesMostImpProblem                  1.020531e-01
## EconomyExcellent                          8.204077e-03
## EconomyGood                               6.008127e-06
## EconomyFair                               1.221374e-01
## EconomyPoor                               6.557562e-04
## NationalEconomyGettingBetter              4.873828e-02
## NationalEconomyGettingWorse               2.581949e-03
## EconomicConfidenceIndex                   1.170463e-04
##                               DemocraticPartyIdentification
## CongressionalApproval                          6.774719e-06
## CongressionalDisapproval                       1.061928e-05
## GOPApproval                                    0.000000e+00
## IndependentPartyApproval                       3.689431e-01
## DemocraticPartyIdentification                            NA
## GoodTimeFindQualityJob                         7.337808e-01
## BadTimeFindQualityJob                          6.147119e-01
## PresidentialApproval                           7.364208e-01
## EconIssuesMostImpProblem                       6.262270e-01
## EconomyExcellent                               1.037823e-02
## EconomyGood                                    9.174116e-02
## EconomyFair                                    1.703537e-05
## EconomyPoor                                    6.994654e-01
## NationalEconomyGettingBetter                   3.203189e-03
## NationalEconomyGettingWorse                    4.355878e-02
## EconomicConfidenceIndex                        4.372410e-01
##                               GoodTimeFindQualityJob BadTimeFindQualityJob
## CongressionalApproval                   1.687949e-02          1.777501e-02
## CongressionalDisapproval                1.539328e-02          1.513070e-02
## GOPApproval                             1.315576e-02          2.168236e-02
## IndependentPartyApproval                1.654610e-01          1.284157e-01
## DemocraticPartyIdentification           7.337808e-01          6.147119e-01
## GoodTimeFindQualityJob                            NA          0.000000e+00
## BadTimeFindQualityJob                   0.000000e+00                    NA
## PresidentialApproval                    2.202381e-05          5.586459e-06
## EconIssuesMostImpProblem                0.000000e+00          0.000000e+00
## EconomyExcellent                        0.000000e+00          0.000000e+00
## EconomyGood                             0.000000e+00          0.000000e+00
## EconomyFair                             6.820951e-05          2.935308e-04
## EconomyPoor                             0.000000e+00          0.000000e+00
## NationalEconomyGettingBetter            2.140817e-01          3.955846e-01
## NationalEconomyGettingWorse             3.888920e-02          9.345588e-02
## EconomicConfidenceIndex                 2.220446e-15          5.284662e-14
##                               PresidentialApproval
## CongressionalApproval                 4.292395e-03
## CongressionalDisapproval              2.573616e-02
## GOPApproval                           3.660991e-01
## IndependentPartyApproval              1.271141e-02
## DemocraticPartyIdentification         7.364208e-01
## GoodTimeFindQualityJob                2.202381e-05
## BadTimeFindQualityJob                 5.586459e-06
## PresidentialApproval                            NA
## EconIssuesMostImpProblem              2.827989e-04
## EconomyExcellent                      3.703808e-07
## EconomyGood                           1.290495e-02
## EconomyFair                           8.773585e-01
## EconomyPoor                           8.970099e-03
## NationalEconomyGettingBetter          1.074252e-12
## NationalEconomyGettingWorse           8.686030e-11
## EconomicConfidenceIndex               3.928154e-02
##                               EconIssuesMostImpProblem EconomyExcellent
## CongressionalApproval                     3.934164e-04     8.349522e-05
## CongressionalDisapproval                  6.871645e-04     4.966781e-05
## GOPApproval                               1.005354e-05     2.157619e-04
## IndependentPartyApproval                  1.020531e-01     8.204077e-03
## DemocraticPartyIdentification             6.262270e-01     1.037823e-02
## GoodTimeFindQualityJob                    0.000000e+00     0.000000e+00
## BadTimeFindQualityJob                     0.000000e+00     0.000000e+00
## PresidentialApproval                      2.827989e-04     3.703808e-07
## EconIssuesMostImpProblem                            NA     0.000000e+00
## EconomyExcellent                          0.000000e+00               NA
## EconomyGood                               0.000000e+00     0.000000e+00
## EconomyFair                               2.792352e-08     4.747698e-01
## EconomyPoor                               0.000000e+00     0.000000e+00
## NationalEconomyGettingBetter              5.662657e-04     2.493260e-01
## NationalEconomyGettingWorse               2.928352e-06     7.319513e-03
## EconomicConfidenceIndex                   0.000000e+00     6.661338e-16
##                                EconomyGood  EconomyFair  EconomyPoor
## CongressionalApproval         3.996803e-15 3.625379e-02 1.996437e-08
## CongressionalDisapproval      1.088019e-14 2.135890e-02 3.827491e-08
## GOPApproval                   1.778577e-13 5.236021e-01 4.833394e-10
## IndependentPartyApproval      6.008127e-06 1.221374e-01 6.557562e-04
## DemocraticPartyIdentification 9.174116e-02 1.703537e-05 6.994654e-01
## GoodTimeFindQualityJob        0.000000e+00 6.820951e-05 0.000000e+00
## BadTimeFindQualityJob         0.000000e+00 2.935308e-04 0.000000e+00
## PresidentialApproval          1.290495e-02 8.773585e-01 8.970099e-03
## EconIssuesMostImpProblem      0.000000e+00 2.792352e-08 0.000000e+00
## EconomyExcellent              0.000000e+00 4.747698e-01 0.000000e+00
## EconomyGood                             NA 6.379910e-04 0.000000e+00
## EconomyFair                   6.379910e-04           NA 2.478906e-12
## EconomyPoor                   0.000000e+00 2.478906e-12           NA
## NationalEconomyGettingBetter  8.345952e-05 2.353229e-12 1.039372e-07
## NationalEconomyGettingWorse   1.065208e-08 1.013412e-11 2.559286e-12
## EconomicConfidenceIndex       0.000000e+00 3.572698e-13 0.000000e+00
##                               NationalEconomyGettingBetter
## CongressionalApproval                         3.181431e-02
## CongressionalDisapproval                      1.257979e-01
## GOPApproval                                   4.385502e-02
## IndependentPartyApproval                      4.873828e-02
## DemocraticPartyIdentification                 3.203189e-03
## GoodTimeFindQualityJob                        2.140817e-01
## BadTimeFindQualityJob                         3.955846e-01
## PresidentialApproval                          1.074252e-12
## EconIssuesMostImpProblem                      5.662657e-04
## EconomyExcellent                              2.493260e-01
## EconomyGood                                   8.345952e-05
## EconomyFair                                   2.353229e-12
## EconomyPoor                                   1.039372e-07
## NationalEconomyGettingBetter                            NA
## NationalEconomyGettingWorse                   0.000000e+00
## EconomicConfidenceIndex                       0.000000e+00
##                               NationalEconomyGettingWorse
## CongressionalApproval                        1.201975e-04
## CongressionalDisapproval                     1.203130e-03
## GOPApproval                                  5.121617e-04
## IndependentPartyApproval                     2.581949e-03
## DemocraticPartyIdentification                4.355878e-02
## GoodTimeFindQualityJob                       3.888920e-02
## BadTimeFindQualityJob                        9.345588e-02
## PresidentialApproval                         8.686030e-11
## EconIssuesMostImpProblem                     2.928352e-06
## EconomyExcellent                             7.319513e-03
## EconomyGood                                  1.065208e-08
## EconomyFair                                  1.013412e-11
## EconomyPoor                                  2.559286e-12
## NationalEconomyGettingBetter                 0.000000e+00
## NationalEconomyGettingWorse                            NA
## EconomicConfidenceIndex                      0.000000e+00
##                               EconomicConfidenceIndex
## CongressionalApproval                    1.021924e-08
## CongressionalDisapproval                 1.387460e-07
## GOPApproval                              1.918349e-08
## IndependentPartyApproval                 1.170463e-04
## DemocraticPartyIdentification            4.372410e-01
## GoodTimeFindQualityJob                   2.220446e-15
## BadTimeFindQualityJob                    5.284662e-14
## PresidentialApproval                     3.928154e-02
## EconIssuesMostImpProblem                 0.000000e+00
## EconomyExcellent                         6.661338e-16
## EconomyGood                              0.000000e+00
## EconomyFair                              3.572698e-13
## EconomyPoor                              0.000000e+00
## NationalEconomyGettingBetter             0.000000e+00
## NationalEconomyGettingWorse              0.000000e+00
## EconomicConfidenceIndex                            NA

The output of the function rcorr() is a list containing the following elements: + r: the correlation matrix + n: the matrix of the number of observations used in analyzing each pair of variables + P: the p-values corresponding to the significance levels of correlations.

Formatting the Correlation Matrix in Four Column Tables

The flattenCorrMatrix function will format the correlation matrix into a table of four columns: row names, column names, the correlation coefficient between each variable and the others, and the p-values.

library(Hmisc)
source('C:/Users/Michael/Desktop/R Code and Data/data/gallup/flattenCorrMatrix.R')
res2 <- rcorr(as.matrix(gallup_vars.active))
flattenCorrMatrix(res2$r, res2$P)
##                               row                        column
## 1           CongressionalApproval      CongressionalDisapproval
## 2           CongressionalApproval                   GOPApproval
## 3        CongressionalDisapproval                   GOPApproval
## 4           CongressionalApproval      IndependentPartyApproval
## 5        CongressionalDisapproval      IndependentPartyApproval
## 6                     GOPApproval      IndependentPartyApproval
## 7           CongressionalApproval DemocraticPartyIdentification
## 8        CongressionalDisapproval DemocraticPartyIdentification
## 9                     GOPApproval DemocraticPartyIdentification
## 10       IndependentPartyApproval DemocraticPartyIdentification
## 11          CongressionalApproval        GoodTimeFindQualityJob
## 12       CongressionalDisapproval        GoodTimeFindQualityJob
## 13                    GOPApproval        GoodTimeFindQualityJob
## 14       IndependentPartyApproval        GoodTimeFindQualityJob
## 15  DemocraticPartyIdentification        GoodTimeFindQualityJob
## 16          CongressionalApproval         BadTimeFindQualityJob
## 17       CongressionalDisapproval         BadTimeFindQualityJob
## 18                    GOPApproval         BadTimeFindQualityJob
## 19       IndependentPartyApproval         BadTimeFindQualityJob
## 20  DemocraticPartyIdentification         BadTimeFindQualityJob
## 21         GoodTimeFindQualityJob         BadTimeFindQualityJob
## 22          CongressionalApproval          PresidentialApproval
## 23       CongressionalDisapproval          PresidentialApproval
## 24                    GOPApproval          PresidentialApproval
## 25       IndependentPartyApproval          PresidentialApproval
## 26  DemocraticPartyIdentification          PresidentialApproval
## 27         GoodTimeFindQualityJob          PresidentialApproval
## 28          BadTimeFindQualityJob          PresidentialApproval
## 29          CongressionalApproval      EconIssuesMostImpProblem
## 30       CongressionalDisapproval      EconIssuesMostImpProblem
## 31                    GOPApproval      EconIssuesMostImpProblem
## 32       IndependentPartyApproval      EconIssuesMostImpProblem
## 33  DemocraticPartyIdentification      EconIssuesMostImpProblem
## 34         GoodTimeFindQualityJob      EconIssuesMostImpProblem
## 35          BadTimeFindQualityJob      EconIssuesMostImpProblem
## 36           PresidentialApproval      EconIssuesMostImpProblem
## 37          CongressionalApproval              EconomyExcellent
## 38       CongressionalDisapproval              EconomyExcellent
## 39                    GOPApproval              EconomyExcellent
## 40       IndependentPartyApproval              EconomyExcellent
## 41  DemocraticPartyIdentification              EconomyExcellent
## 42         GoodTimeFindQualityJob              EconomyExcellent
## 43          BadTimeFindQualityJob              EconomyExcellent
## 44           PresidentialApproval              EconomyExcellent
## 45       EconIssuesMostImpProblem              EconomyExcellent
## 46          CongressionalApproval                   EconomyGood
## 47       CongressionalDisapproval                   EconomyGood
## 48                    GOPApproval                   EconomyGood
## 49       IndependentPartyApproval                   EconomyGood
## 50  DemocraticPartyIdentification                   EconomyGood
## 51         GoodTimeFindQualityJob                   EconomyGood
## 52          BadTimeFindQualityJob                   EconomyGood
## 53           PresidentialApproval                   EconomyGood
## 54       EconIssuesMostImpProblem                   EconomyGood
## 55               EconomyExcellent                   EconomyGood
## 56          CongressionalApproval                   EconomyFair
## 57       CongressionalDisapproval                   EconomyFair
## 58                    GOPApproval                   EconomyFair
## 59       IndependentPartyApproval                   EconomyFair
## 60  DemocraticPartyIdentification                   EconomyFair
## 61         GoodTimeFindQualityJob                   EconomyFair
## 62          BadTimeFindQualityJob                   EconomyFair
## 63           PresidentialApproval                   EconomyFair
## 64       EconIssuesMostImpProblem                   EconomyFair
## 65               EconomyExcellent                   EconomyFair
## 66                    EconomyGood                   EconomyFair
## 67          CongressionalApproval                   EconomyPoor
## 68       CongressionalDisapproval                   EconomyPoor
## 69                    GOPApproval                   EconomyPoor
## 70       IndependentPartyApproval                   EconomyPoor
## 71  DemocraticPartyIdentification                   EconomyPoor
## 72         GoodTimeFindQualityJob                   EconomyPoor
## 73          BadTimeFindQualityJob                   EconomyPoor
## 74           PresidentialApproval                   EconomyPoor
## 75       EconIssuesMostImpProblem                   EconomyPoor
## 76               EconomyExcellent                   EconomyPoor
## 77                    EconomyGood                   EconomyPoor
## 78                    EconomyFair                   EconomyPoor
## 79          CongressionalApproval  NationalEconomyGettingBetter
## 80       CongressionalDisapproval  NationalEconomyGettingBetter
## 81                    GOPApproval  NationalEconomyGettingBetter
## 82       IndependentPartyApproval  NationalEconomyGettingBetter
## 83  DemocraticPartyIdentification  NationalEconomyGettingBetter
## 84         GoodTimeFindQualityJob  NationalEconomyGettingBetter
## 85          BadTimeFindQualityJob  NationalEconomyGettingBetter
## 86           PresidentialApproval  NationalEconomyGettingBetter
## 87       EconIssuesMostImpProblem  NationalEconomyGettingBetter
## 88               EconomyExcellent  NationalEconomyGettingBetter
## 89                    EconomyGood  NationalEconomyGettingBetter
## 90                    EconomyFair  NationalEconomyGettingBetter
## 91                    EconomyPoor  NationalEconomyGettingBetter
## 92          CongressionalApproval   NationalEconomyGettingWorse
## 93       CongressionalDisapproval   NationalEconomyGettingWorse
## 94                    GOPApproval   NationalEconomyGettingWorse
## 95       IndependentPartyApproval   NationalEconomyGettingWorse
## 96  DemocraticPartyIdentification   NationalEconomyGettingWorse
## 97         GoodTimeFindQualityJob   NationalEconomyGettingWorse
## 98          BadTimeFindQualityJob   NationalEconomyGettingWorse
## 99           PresidentialApproval   NationalEconomyGettingWorse
## 100      EconIssuesMostImpProblem   NationalEconomyGettingWorse
## 101              EconomyExcellent   NationalEconomyGettingWorse
## 102                   EconomyGood   NationalEconomyGettingWorse
## 103                   EconomyFair   NationalEconomyGettingWorse
## 104                   EconomyPoor   NationalEconomyGettingWorse
## 105  NationalEconomyGettingBetter   NationalEconomyGettingWorse
## 106         CongressionalApproval       EconomicConfidenceIndex
## 107      CongressionalDisapproval       EconomicConfidenceIndex
## 108                   GOPApproval       EconomicConfidenceIndex
## 109      IndependentPartyApproval       EconomicConfidenceIndex
## 110 DemocraticPartyIdentification       EconomicConfidenceIndex
## 111        GoodTimeFindQualityJob       EconomicConfidenceIndex
## 112         BadTimeFindQualityJob       EconomicConfidenceIndex
## 113          PresidentialApproval       EconomicConfidenceIndex
## 114      EconIssuesMostImpProblem       EconomicConfidenceIndex
## 115              EconomyExcellent       EconomicConfidenceIndex
## 116                   EconomyGood       EconomicConfidenceIndex
## 117                   EconomyFair       EconomicConfidenceIndex
## 118                   EconomyPoor       EconomicConfidenceIndex
## 119  NationalEconomyGettingBetter       EconomicConfidenceIndex
## 120   NationalEconomyGettingWorse       EconomicConfidenceIndex
##             cor            p
## 1   -0.98755223 0.000000e+00
## 2    0.58550298 2.220446e-16
## 3   -0.56007004 7.549517e-15
## 4   -0.54642725 4.529710e-14
## 5    0.56430292 4.440892e-15
## 6   -0.13592984 8.360889e-02
## 7    0.34432718 6.774719e-06
## 8   -0.33738780 1.061928e-05
## 9    0.60066056 0.000000e+00
## 10   0.07082908 3.689431e-01
## 11   0.18693204 1.687949e-02
## 12  -0.18951865 1.539328e-02
## 13   0.19385727 1.315576e-02
## 14  -0.10914844 1.654610e-01
## 15   0.02684049 7.337808e-01
## 16  -0.18546768 1.777501e-02
## 17   0.18999811 1.513070e-02
## 18  -0.17974387 2.168236e-02
## 19   0.11957784 1.284157e-01
## 20  -0.03971621 6.147119e-01
## 21  -0.99634165 0.000000e+00
## 22   0.22258599 4.292395e-03
## 23  -0.17467946 2.573616e-02
## 24   0.07124973 3.660991e-01
## 25  -0.19479504 1.271141e-02
## 26  -0.02656408 7.364208e-01
## 27  -0.32575768 2.202381e-05
## 28   0.34725454 5.586459e-06
## 29  -0.27438745 3.934164e-04
## 30   0.26321083 6.871645e-04
## 31  -0.33824196 1.005354e-05
## 32   0.12851901 1.020531e-01
## 33  -0.03842990 6.262270e-01
## 34  -0.87116969 0.000000e+00
## 35   0.85677624 0.000000e+00
## 36   0.28078634 2.827989e-04
## 37   0.30319369 8.349522e-05
## 38  -0.31220561 4.966781e-05
## 39   0.28591835 2.157619e-04
## 40  -0.20641293 8.204077e-03
## 41   0.20025048 1.037823e-02
## 42   0.68063915 0.000000e+00
## 43  -0.68506318 0.000000e+00
## 44  -0.38563430 3.703808e-07
## 45  -0.76395047 0.000000e+00
## 46   0.56509131 3.996803e-15
## 47  -0.55745113 1.088019e-14
## 48   0.53546351 1.778577e-13
## 49  -0.34615344 6.008127e-06
## 50   0.13251662 9.174116e-02
## 51   0.79419756 0.000000e+00
## 52  -0.78634655 0.000000e+00
## 53  -0.19438310 1.290495e-02
## 54  -0.86535108 0.000000e+00
## 55   0.77358299 0.000000e+00
## 56  -0.16416907 3.625379e-02
## 57   0.18018225 2.135890e-02
## 58   0.05031211 5.236021e-01
## 59   0.12156745 1.221374e-01
## 60  -0.32990655 1.703537e-05
## 61   0.30673668 6.820951e-05
## 62  -0.28007206 2.935308e-04
## 63   0.01218046 8.773585e-01
## 64  -0.41814154 2.792352e-08
## 65   0.05637205 4.747698e-01
## 66   0.26472679 6.379910e-04
## 67  -0.42211813 1.996437e-08
## 68   0.41435677 3.827491e-08
## 69  -0.46309128 4.833394e-10
## 70   0.26416707 6.557562e-04
## 71  -0.03046459 6.994654e-01
## 72  -0.79958552 0.000000e+00
## 73   0.78620100 0.000000e+00
## 74   0.20409261 8.970099e-03
## 75   0.90034884 0.000000e+00
## 76  -0.75651586 0.000000e+00
## 77  -0.95670152 0.000000e+00
## 78  -0.51319015 2.478906e-12
## 79   0.16823968 3.181431e-02
## 80  -0.12039809 1.257979e-01
## 81   0.15808661 4.385502e-02
## 82  -0.15463102 4.873828e-02
## 83  -0.22954443 3.203189e-03
## 84   0.09783312 2.140817e-01
## 85  -0.06698129 3.955846e-01
## 86   0.52043772 1.074252e-12
## 87  -0.26714334 5.662657e-04
## 88   0.09074272 2.493260e-01
## 89   0.30320123 8.345952e-05
## 90   0.51364684 2.353229e-12
## 91  -0.40204749 1.039372e-07
## 92  -0.29669252 1.201975e-04
## 93   0.25147718 1.203130e-03
## 94  -0.26916093 5.121617e-04
## 95   0.23454432 2.581949e-03
## 96   0.15830640 4.355878e-02
## 97  -0.16194813 3.888920e-02
## 98   0.13182797 9.345588e-02
## 99  -0.48030654 8.686030e-11
## 100  0.35685092 2.928352e-06
## 101 -0.20934449 7.319513e-03
## 102 -0.42942801 1.065208e-08
## 103 -0.50058699 1.013412e-11
## 104  0.51290995 2.559286e-12
## 105 -0.97982675 0.000000e+00
## 106  0.42990467 1.021924e-08
## 107 -0.39839372 1.387460e-07
## 108  0.42258760 1.918349e-08
## 109 -0.29717180 1.170463e-04
## 110 -0.06126272 4.372410e-01
## 111  0.56908137 2.220446e-15
## 112 -0.54517692 5.284662e-14
## 113  0.16162841 3.928154e-02
## 114 -0.73012298 0.000000e+00
## 115  0.57749361 6.661338e-16
## 116  0.81390089 0.000000e+00
## 117  0.52972317 3.572698e-13
## 118 -0.86882377 0.000000e+00
## 119  0.79125065 0.000000e+00
## 120 -0.86457312 0.000000e+00

Visualization of the Correlation Matrix

We can visualize the correlation matrix using a correlogram. The package corrplot is required. The function corrplot() takes the correlation matrix as the first argument. The second argument (type=“upper”) is used to display only the upper triangular of the correlation matrix.

Note that positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients. In the right side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors.

library("corrplot")
corrplot(cor.mat, type="upper", order="hclust", 
         tl.col="black", tl.srt=45)

Using chart.Correlation() to Draw Scatter Plots

We can make a scatter plot matrix showing the correlation coefficients between variables and their significance levels. The package PerformanceAnalytics is required.

library("PerformanceAnalytics")
chart.Correlation(gallup_vars.active[, 1:6], histogram=TRUE, pch=19)

In the above plot:

  • The distribution of each variable is shown on the diagonal.
  • On the bottom of the diagonal: the bi-variate scatter plots with a fitted line are displayed
  • On the top of the diagonal: the value of the correlation plus the significance level as stars
  • Each significance level is associated to a symbol: p-values(0, 0.001, 0.01, 0.05, 0.1, 1) <=> symbols(““,”“,””, “.”, " “)

Modeling

Principal Component Analysis

The function PCA() in the FactoMiner package will be used to conduct our PCA study. A simplified format is:

PCA(X, scale.unit = TRUE, ncp = 5, graph = TRUE)

  • X: a data frame. Rows are individuals and columns are numeric variables
  • scale.unit: a logical value. If TRUE, the data are scaled to unit variance before the analysis.
    • This numeric standardization to the same scale avoids some variables becoming dominant just because of their large numeric measurement units.
  • ncp: number of dimensions kept in the final results.
  • graph: a logical value. If TRUE, then a graph is displayed.
# Perform the PCA
library(FactoMineR)
res.pca <- PCA(gallup_vars.active, graph = FALSE)

The output of the function PCA() is a list including:

print(res.pca)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 163 individuals, described by 16 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"

The object that is created using the function PCA() contains much information found in many different lists and matrices. These values are described in the next section.

Variances of the Principal Components

The proportion of variances retained by the principal components can be extracted as follows:

eigenvalues <- res.pca$eig
head(eigenvalues[, 1:2])
##        eigenvalue percentage of variance
## comp 1  7.0182722              43.864201
## comp 2  2.8940054              18.087534
## comp 3  2.7279418              17.049637
## comp 4  1.2243686               7.652304
## comp 5  0.6023628               3.764767
## comp 6  0.5353847               3.346155
  • Eigenvalues correspond to the amount of the variation explained by each principal component (PC).
  • Eigenvalues are large for the first PC and small for the subsequent PCs.
  • A PC with an eigenvalue > 1 indicates that the PC accounts for more variance than accounted by one of the original variables in the standardized data.
  • This is commonly used as a cutoff point to determine the number of PCs to retain.

We can visualize this by creating a scree plot using base R graphics:

  • A scree plot is a graph of the eigenvalues/variances associated with components.
barplot(eigenvalues[, 2], names.arg=1:nrow(eigenvalues), 
        main = "Variances",
        xlab = "Principal Components",
        ylab = "Percentage of variances",
        col ="steelblue")

# Add connected line segments to the plot
lines(x = 1:nrow(eigenvalues), eigenvalues[, 2], 
      type="b", pch=19, col = "red")

  • Note that ~70% of the information (variances) contained in the data are retained by the first two principal components.

We can make the the scree plot using the package factoextra:

fviz_screeplot(res.pca, ncp=10)

Graph of Individual Rows and Variables

The function plot.PCA() can be used. A simplified format is:

plot.PCA(x, axes = c(1,2), choix = c(“ind”, “var”))

  • x: An object of class PCA
  • axes: A numeric vector of length 2 specifying the component to plot
  • choix: The graph to be plotted. Possible values are “ind” for the individuals and “var” for the variables
PCA(gallup_vars.active, scale.unit=TRUE, ncp=5, ind.sup=NULL,
    quanti.sup=NULL, quali.sup=NULL, graph=TRUE, axes = c(1,2))

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 163 individuals, described by 16 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"
res <- PCA(gallup_vars.active, graph = FALSE)

Variables Factor Map - the Correlation Circle

This will get the coordinates of variables on the principal components:

head(res.pca$var$coord)
##                                    Dim.1       Dim.2      Dim.3
## CongressionalApproval          0.5569729  0.36359255  0.6708696
## CongressionalDisapproval      -0.5426159 -0.32350952 -0.6891506
## GOPApproval                    0.5231316  0.18698744  0.4914883
## IndependentPartyApproval      -0.3537076 -0.32414902 -0.3486855
## DemocraticPartyIdentification  0.1117742 -0.08700543  0.6718603
## GoodTimeFindQualityJob         0.7898765 -0.48119899 -0.1360081
##                                     Dim.4       Dim.5
## CongressionalApproval         -0.13978824 -0.13503294
## CongressionalDisapproval       0.17709650  0.15867504
## GOPApproval                    0.53262682 -0.21361578
## IndependentPartyApproval       0.66510093  0.05587073
## DemocraticPartyIdentification  0.60724381  0.20051545
## GoodTimeFindQualityJob        -0.07762488  0.12697547

Viewing the Quality of the Variables on the Factor Map using Cos2

The quality of representation of the variables of the principal components are called the cos2.

head(res.pca$var$cos2)
##                                    Dim.1       Dim.2      Dim.3
## CongressionalApproval         0.31021886 0.132199539 0.45006605
## CongressionalDisapproval      0.29443205 0.104658411 0.47492860
## GOPApproval                   0.27366670 0.034964304 0.24156077
## IndependentPartyApproval      0.12510904 0.105072586 0.12158156
## DemocraticPartyIdentification 0.01249348 0.007569945 0.45139623
## GoodTimeFindQualityJob        0.62390488 0.231552467 0.01849821
##                                     Dim.4       Dim.5
## CongressionalApproval         0.019540752 0.018233895
## CongressionalDisapproval      0.031363169 0.025177769
## GOPApproval                   0.283691325 0.045631701
## IndependentPartyApproval      0.442359243 0.003121538
## DemocraticPartyIdentification 0.368745050 0.040206446
## GoodTimeFindQualityJob        0.006025622 0.016122770

Contributions of the Variables to the Principal Components

Variable contributions in the determination of a given principal component are (in percentage):

(var.cos2 * 100) / (total cos2 of the component)

head(res.pca$var$contrib)
##                                   Dim.1     Dim.2      Dim.3      Dim.4
## CongressionalApproval         4.4201601 4.5680474 16.4983740  1.5959860
## CongressionalDisapproval      4.1952213 3.6163862 17.4097773  2.5615790
## GOPApproval                   3.8993458 1.2081631  8.8550558 23.1704181
## IndependentPartyApproval      1.7826188 3.6306977  4.4568971 36.1295806
## DemocraticPartyIdentification 0.1780136 0.2615733 16.5471355 30.1171597
## GoodTimeFindQualityJob        8.8897220 8.0011070  0.6781013  0.4921411
##                                   Dim.5
## CongressionalApproval         3.0270623
## CongressionalDisapproval      4.1798350
## GOPApproval                   7.5754520
## IndependentPartyApproval      0.5182157
## DemocraticPartyIdentification 6.6747896
## GoodTimeFindQualityJob        2.6765882

Graph of the Variables using the FactoMineR Base Graph

plot(res.pca, choix = "var")

Graph of the Variables using factoextra

The function fviz_pca_var() is used to visualize the variables:

# Default plot
fviz_pca_var(res.pca)

# Change color and theme
fviz_pca_var(res.pca, col.var="steelblue")+
  theme_minimal()

  • Note that by using the factoextra package, the color or the transparency of the variables can be automatically controlled by the value of their contributions, their cos2, their coordinates on x or y axis.
  • This is helpful to highlight the most important variables in the determination of the principal components.
# Control variable colors using their contribution
# Possible values for the argument col.var are:
# "cos2", "contrib", "coord", "x", "y"
fviz_pca_var(res.pca, col.var="contrib")

# Change the gradient color
fviz_pca_var(res.pca, col.var="contrib") +
scale_color_gradient2(low="white", mid="blue", 
                      high="red", midpoint=55) + 
                      theme_bw()

  • It is also possible to control automatically the transparency of variables by their contributions:
# Control the transparency of variables using their contribution
# Possible values for the argument alpha.var are:
# "cos2", "contrib", "coord", "x", "y"
fviz_pca_var(res.pca, alpha.var="contrib")+
  theme_minimal()

  • Construct a biplot of the variables:
fviz_pca_biplot(res.pca,  geom = "text")

Dimension Description

The function dimdesc() can be used to identify the most correlated variables with a given principal component.

A simplified format is :

dimdesc(res, axes = 1:3, proba = 0.05)

  • res: an object of class PCA
  • axes: a numeric vector specifying the dimensions to be described
  • prob: the significance level

Here is an example of usage:

res.desc <- dimdesc(res.pca, axes = c(1,2))
# Description of dimension 1
res.desc$Dim.1
## $quanti
##                              correlation      p.value
## EconomyGood                    0.9641906 1.026975e-94
## EconomicConfidenceIndex        0.9036939 3.411436e-61
## GoodTimeFindQualityJob         0.7898765 5.130578e-36
## EconomyExcellent               0.7557959 2.049964e-31
## CongressionalApproval          0.5569729 1.154894e-14
## GOPApproval                    0.5231316 7.832102e-13
## NationalEconomyGettingBetter   0.4724129 1.930925e-10
## EconomyFair                    0.3983387 1.393471e-07
## IndependentPartyApproval      -0.3537076 3.626766e-06
## CongressionalDisapproval      -0.5426159 7.320973e-14
## NationalEconomyGettingWorse   -0.5867922 1.852951e-16
## BadTimeFindQualityJob         -0.7762424 4.447971e-34
## EconIssuesMostImpProblem      -0.8877230 4.076556e-56
## EconomyPoor                   -0.9687478 2.146154e-99
# Description of dimension 2
res.desc$Dim.2
## $quanti
##                              correlation      p.value
## PresidentialApproval           0.8307182 8.343893e-43
## NationalEconomyGettingBetter   0.6969911 4.991305e-25
## BadTimeFindQualityJob          0.5039352 7.010668e-12
## CongressionalApproval          0.3635925 1.836611e-06
## EconIssuesMostImpProblem       0.3159317 3.987068e-05
## EconomicConfidenceIndex        0.3142492 4.404528e-05
## GOPApproval                    0.1869874 1.684640e-02
## CongressionalDisapproval      -0.3235095 2.527312e-05
## IndependentPartyApproval      -0.3241490 2.430559e-05
## EconomyExcellent              -0.3780998 6.488223e-07
## GoodTimeFindQualityJob        -0.4811990 7.925669e-11
## NationalEconomyGettingWorse   -0.6801663 1.776039e-23

Automatic PCA with FactoInvestigate

The package FactoInvestigate generates an automatic report with interpretation of FactoMineR-based principal component analyses. The main function provided by the package is the function Investigate(), which can be used to create either a Word, PDF or a HTML report.

The report includes the following items:

  • detection of existing outliers
  • identification of the major first principal components
  • plots
  • description of dimensions
library(FactoInvestigate)
#Investigate(res,document="pdf_document",
#            parallel = TRUE)

Evaluation

Summary

References