Packages

library(psych)
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.1.3
## corrplot 0.92 loaded
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
## Warning: package 'lubridate' was built under R version 4.1.3
## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v dplyr     1.1.2     v readr     2.1.4
## v forcats   1.0.0     v stringr   1.5.0
## v ggplot2   3.4.2     v tibble    3.2.1
## v lubridate 1.9.2     v tidyr     1.3.0
## v purrr     1.0.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x ggplot2::%+%()   masks psych::%+%()
## x ggplot2::alpha() masks psych::alpha()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(haven)
## Warning: package 'haven' was built under R version 4.1.3
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.1.3
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loading required package: survival
## 
## Attaching package: 'survey'
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.1.3
## 
## Attaching package: 'Hmisc'
## 
## The following object is masked from 'package:survey':
## 
##     deff
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## The following object is masked from 'package:psych':
## 
##     describe
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(stats)
library(skimr)
## Warning: package 'skimr' was built under R version 4.1.3

import data

anes <- read_dta("anes_panel_2013_inetrecontact.dta")

Alpha analysis

# creating new data frames for each section
sdo_df <- data.frame(anes$C5_T1, anes$C5_T2, anes$C5_T3, anes$C5_T4)

rwa_df <- data.frame(anes$C5_U1, anes$C5_U2, anes$C5_U3, anes$C5_U4, anes$C5_U5)

# Assigning new names for easier readability
new_names_sdo <- c("Cur_equ", "Gro_equ", "ideal_equ", "push_equ")
new_names_rwa <- c("Open_minded", "free_think", "auth_control", "str_lead", "tradition")
# changing column names
colnames(sdo_df) <- new_names_sdo
colnames(rwa_df) <- new_names_rwa
# searching for missing data | showing values as low as -7.
skim(sdo_df)
Data summary
Name sdo_df
Number of rows 1635
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Cur_equ 0 1 3.00 2.31 -7 3 3 5 5 ▁▁▁▂▇
Gro_equ 0 1 3.33 2.27 -7 3 4 5 5 ▁▁▁▁▇
ideal_equ 0 1 2.34 2.18 -7 2 3 3 5 ▁▁▁▆▇
push_equ 0 1 2.41 2.23 -7 2 3 4 5 ▁▁▁▇▇
skim(rwa_df)
Data summary
Name rwa_df
Number of rows 1635
Number of columns 5
_______________________
Column type frequency:
numeric 5
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Open_minded 0 1 1.88 2.06 -7 1 2 3 5 ▁▁▁▇▅
free_think 0 1 2.02 2.06 -7 2 2 3 5 ▁▁▁▇▆
auth_control 0 1 2.70 2.27 -7 2 3 4 5 ▁▁▁▃▇
str_lead 0 1 2.44 2.31 -7 2 3 4 5 ▁▁▁▅▇
tradition 0 1 2.28 2.16 -7 2 2 3 5 ▁▁▁▇▇
# replacing those negative values that are represented as answers in codebook
sdo_df[sdo_df <= -1] <- NA
rwa_df[rwa_df <= -1] <- NA

# checking if they changed. They did lowest value is now 1 and roughly 80 missing per question.
skim(sdo_df)
Data summary
Name sdo_df
Number of rows 1635
Number of columns 4
_______________________
Column type frequency:
numeric 4
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Cur_equ 82 0.95 3.46 1.14 1 3 3 5 5 ▁▂▇▂▅
Gro_equ 79 0.95 3.79 1.01 1 3 4 5 5 ▁▁▇▅▇
ideal_equ 82 0.95 2.77 1.16 1 2 3 3 5 ▃▆▇▃▂
push_equ 80 0.95 2.82 1.30 1 2 3 4 5 ▅▇▆▅▃
skim(rwa_df)
Data summary
Name rwa_df
Number of rows 1635
Number of columns 5
_______________________
Column type frequency:
numeric 5
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Open_minded 79 0.95 2.26 1.17 1 1 2 3 5 ▆▇▃▂▂
free_think 82 0.95 2.43 1.06 1 2 2 3 5 ▃▇▆▂▁
auth_control 82 0.95 3.14 1.20 1 2 3 4 5 ▂▆▇▆▅
str_lead 87 0.95 2.90 1.27 1 2 3 4 5 ▅▆▇▃▅
tradition 81 0.95 2.69 1.20 1 2 3 4 5 ▅▇▆▃▂
# checking correlation before flipping the scales
cor_matrix_sdo <- cor(sdo_df, use = "pairwise.complete.obs")
cor_table_sdo <- round(cor_matrix_sdo, 2)
print(cor_table_sdo)
##           Cur_equ Gro_equ ideal_equ push_equ
## Cur_equ      1.00    0.53     -0.11    -0.08
## Gro_equ      0.53    1.00     -0.23    -0.16
## ideal_equ   -0.11   -0.23      1.00     0.66
## push_equ    -0.08   -0.16      0.66     1.00
# again for rwa | there appears to be correlation between the first two questions together and then the last 2-3 questions together. I am thinking they are touching slightly different concepts
cor_matrix_rwa <- cor(rwa_df, use = "pairwise.complete.obs")
cor_table_rwa <- round(cor_matrix_rwa, 2)
print(cor_table_rwa)
##              Open_minded free_think auth_control str_lead tradition
## Open_minded         1.00       0.47        -0.13    -0.14     -0.19
## free_think          0.47       1.00        -0.23    -0.19     -0.29
## auth_control       -0.13      -0.23         1.00     0.64      0.54
## str_lead           -0.14      -0.19         0.64     1.00      0.59
## tradition          -0.19      -0.29         0.54     0.59      1.00
# Alpha checks
psych::alpha(sdo_df, na.rm = TRUE, check.keys=FALSE) #Run the alpha calculation
## Warning in psych::alpha(sdo_df, na.rm = TRUE, check.keys = FALSE): Some items were negatively correlated with the total scale and probably 
## should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## Some items ( Cur_equ Gro_equ ) were negatively correlated with the total scale and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## 
## Reliability analysis   
## Call: psych::alpha(x = sdo_df, na.rm = TRUE, check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r  S/N   ase mean   sd median_r
##       0.32      0.31    0.52       0.1 0.45 0.029  3.2 0.67   -0.096
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.27  0.32  0.38
## Duhachek  0.27  0.32  0.38
## 
##  Reliability if an item is dropped:
##           raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r med.r
## Cur_equ        0.30      0.23    0.42     0.089 0.29    0.028  0.25 -0.16
## Gro_equ        0.38      0.36    0.47     0.158 0.56    0.026  0.19 -0.08
## ideal_equ      0.18      0.24    0.33     0.096 0.32    0.036  0.14 -0.08
## push_equ       0.15      0.17    0.30     0.063 0.20    0.037  0.17 -0.11
## 
##  Item statistics 
##              n raw.r std.r r.cor r.drop mean  sd
## Cur_equ   1553  0.55  0.59  0.38  0.137  3.5 1.1
## Gro_equ   1556  0.43  0.50  0.27  0.053  3.8 1.0
## ideal_equ 1553  0.62  0.58  0.47  0.233  2.8 1.2
## push_equ  1555  0.68  0.62  0.52  0.253  2.8 1.3
## 
## Non missing response frequency for each item
##              1    2    3    4    5 miss
## Cur_equ   0.05 0.11 0.44 0.13 0.27 0.05
## Gro_equ   0.01 0.06 0.39 0.20 0.33 0.05
## ideal_equ 0.15 0.25 0.36 0.14 0.09 0.05
## push_equ  0.17 0.29 0.23 0.16 0.15 0.05
psych::alpha(sdo_df, na.rm = TRUE, check.keys=TRUE) #Run the alpha calculation
## Warning in psych::alpha(sdo_df, na.rm = TRUE, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = sdo_df, na.rm = TRUE, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.63      0.63    0.67       0.3 1.7 0.016  2.6 0.79      0.2
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.60  0.63  0.65
## Duhachek  0.59  0.63  0.66
## 
##  Reliability if an item is dropped:
##           raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Cur_equ-       0.63      0.62    0.60      0.35 1.6    0.015 0.074  0.23
## Gro_equ-       0.55      0.54    0.55      0.29 1.2    0.019 0.108  0.11
## ideal_equ      0.48      0.51    0.48      0.26 1.0    0.023 0.058  0.16
## push_equ       0.54      0.55    0.51      0.29 1.2    0.020 0.047  0.23
## 
##  Item statistics 
##              n raw.r std.r r.cor r.drop mean  sd
## Cur_equ-  1553  0.60  0.63  0.45   0.29  2.5 1.1
## Gro_equ-  1556  0.66  0.70  0.55   0.42  2.2 1.0
## ideal_equ 1553  0.75  0.73  0.65   0.50  2.8 1.2
## push_equ  1555  0.73  0.69  0.60   0.43  2.8 1.3
## 
## Non missing response frequency for each item
##              1    2    3    4    5 miss
## Cur_equ   0.05 0.11 0.44 0.13 0.27 0.05
## Gro_equ   0.01 0.06 0.39 0.20 0.33 0.05
## ideal_equ 0.15 0.25 0.36 0.14 0.09 0.05
## push_equ  0.17 0.29 0.23 0.16 0.15 0.05
psych::alpha(rwa_df, na.rm = TRUE, check.keys=FALSE) #Run the alpha calculation
## Warning in psych::alpha(rwa_df, na.rm = TRUE, check.keys = FALSE): Some items were negatively correlated with the total scale and probably 
## should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## Some items ( Open_minded free_think ) were negatively correlated with the total scale and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## 
## Reliability analysis   
## Call: psych::alpha(x = rwa_df, na.rm = TRUE, check.keys = FALSE)
## 
##   raw_alpha std.alpha G6(smc) average_r  S/N   ase mean   sd median_r
##       0.41      0.38    0.56      0.11 0.61 0.024  2.7 0.65    -0.13
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.36  0.41  0.45
## Duhachek  0.36  0.41  0.45
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r med.r
## Open_minded       0.51      0.46    0.58     0.178 0.86    0.018  0.21  0.18
## free_think        0.54      0.53    0.61     0.219 1.12    0.018  0.17  0.21
## auth_control      0.17      0.15    0.39     0.043 0.18    0.035  0.15 -0.16
## str_lead          0.12      0.11    0.34     0.030 0.12    0.037  0.14 -0.16
## tradition         0.25      0.24    0.45     0.072 0.31    0.032  0.14 -0.13
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean  sd
## Open_minded  1556  0.35  0.38  0.12 -0.018  2.3 1.2
## free_think   1553  0.24  0.29  0.02 -0.097  2.4 1.1
## auth_control 1553  0.70  0.68  0.64  0.426  3.1 1.2
## str_lead     1548  0.74  0.71  0.70  0.459  2.9 1.3
## tradition    1554  0.65  0.62  0.53  0.333  2.7 1.2
## 
## Non missing response frequency for each item
##                 1    2    3    4    5 miss
## Open_minded  0.30 0.37 0.18 0.09 0.07 0.05
## free_think   0.19 0.38 0.27 0.10 0.05 0.05
## auth_control 0.09 0.23 0.30 0.21 0.17 0.05
## str_lead     0.16 0.24 0.30 0.14 0.16 0.05
## tradition    0.17 0.32 0.26 0.16 0.10 0.05
psych::alpha(rwa_df, na.rm = TRUE, check.keys=TRUE) #Run the alpha calculation
## Warning in psych::alpha(rwa_df, na.rm = TRUE, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = rwa_df, na.rm = TRUE, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.72      0.72    0.73      0.34 2.6 0.011  3.2 0.82     0.26
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.7  0.72  0.74
## Duhachek   0.7  0.72  0.74
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Open_minded-      0.75      0.74    0.72      0.41 2.8    0.010 0.040  0.42
## free_think-       0.71      0.70    0.69      0.37 2.4    0.012 0.059  0.37
## auth_control      0.64      0.64    0.64      0.31 1.8    0.015 0.033  0.24
## str_lead          0.64      0.64    0.63      0.31 1.8    0.015 0.027  0.26
## tradition         0.63      0.63    0.64      0.30 1.7    0.015 0.044  0.21
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean  sd
## Open_minded- 1556  0.55  0.56  0.39   0.29  3.7 1.2
## free_think-  1553  0.61  0.63  0.49   0.40  3.6 1.1
## auth_control 1553  0.75  0.74  0.68   0.57  3.1 1.2
## str_lead     1548  0.76  0.74  0.69   0.57  2.9 1.3
## tradition    1554  0.77  0.76  0.69   0.60  2.7 1.2
## 
## Non missing response frequency for each item
##                 1    2    3    4    5 miss
## Open_minded  0.30 0.37 0.18 0.09 0.07 0.05
## free_think   0.19 0.38 0.27 0.10 0.05 0.05
## auth_control 0.09 0.23 0.30 0.21 0.17 0.05
## str_lead     0.16 0.24 0.30 0.14 0.16 0.05
## tradition    0.17 0.32 0.26 0.16 0.10 0.05
# lets flip the scale and check again higher values toward the latent concept and it is looking like roughly 2 for each section

anes <- anes %>% 
  mutate(Cur_equ = case_when(
    C5_T1 ==1 ~ 5,
    C5_T1 ==2 ~ 4,
    C5_T1 ==3 ~ 3,
    C5_T1 ==4 ~ 2,
    C5_T1 ==5 ~ 1
  ))

anes <- anes %>% 
  mutate(gro_equ = case_when(
    C5_T2 ==1 ~ 5,
    C5_T2 ==2 ~ 4,
    C5_T2 ==3 ~ 3,
    C5_T2 ==4 ~ 2,
    C5_T2 ==5 ~ 1
  ))

anes <- anes %>% 
  mutate(ideal_equ = case_when(
    C5_T3 ==1 ~ 1,
    C5_T3 ==2 ~ 2,
    C5_T3 ==3 ~ 3,
    C5_T3 ==4 ~ 4,
    C5_T3 ==5 ~ 5
  ))

anes <- anes %>% 
  mutate(push_equ = case_when(
    C5_T4 ==1 ~ 1,
    C5_T4 ==2 ~ 2,
    C5_T4 ==3 ~ 3,
    C5_T4 ==4 ~ 4,
    C5_T4 ==5 ~ 5
  ))
# flipping and renaming the RWA
anes <- anes %>% 
  mutate(open_minded_belief = case_when(
    C5_U1 ==1 ~ 1,
    C5_U1 ==2 ~ 2,
    C5_U1 ==3 ~ 3,
    C5_U1 ==4 ~ 4,
    C5_U1 ==5 ~ 5
  ))

anes <- anes %>% 
  mutate(free_belief = case_when(
    C5_U2 ==1 ~ 1,
    C5_U2 ==2 ~ 2,
    C5_U2 ==3 ~ 3,
    C5_U2 ==4 ~ 4,
    C5_U2 ==5 ~ 5
  ))

anes <- anes %>% 
  mutate(auth_control_belief = case_when(
    C5_U3 ==1 ~ 5,
    C5_U3 ==2 ~ 4,
    C5_U3 ==3 ~ 3,
    C5_U3 ==4 ~ 2,
    C5_U3 ==5 ~ 1
  ))

anes <- anes %>% 
  mutate(str_lead_belief = case_when(
    C5_U4 ==1 ~ 5,
    C5_U4 ==2 ~ 4,
    C5_U4 ==3 ~ 3,
    C5_U4 ==4 ~ 2,
    C5_U4 ==5 ~ 1
  ))

anes <- anes %>% 
  mutate(tradition_belief = case_when(
    C5_U5 ==1 ~ 5,
    C5_U5 ==2 ~ 4,
    C5_U5 ==3 ~ 3,
    C5_U5 ==4 ~ 2,
    C5_U5 ==5 ~ 1
  ))

# checking again after the changes
# first alpha check SDO group
psych::alpha(sdo_df, na.rm = TRUE, check.keys=TRUE) #Run the alpha calculation
## Warning in psych::alpha(sdo_df, na.rm = TRUE, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = sdo_df, na.rm = TRUE, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.63      0.63    0.67       0.3 1.7 0.016  2.6 0.79      0.2
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.60  0.63  0.65
## Duhachek  0.59  0.63  0.66
## 
##  Reliability if an item is dropped:
##           raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Cur_equ-       0.63      0.62    0.60      0.35 1.6    0.015 0.074  0.23
## Gro_equ-       0.55      0.54    0.55      0.29 1.2    0.019 0.108  0.11
## ideal_equ      0.48      0.51    0.48      0.26 1.0    0.023 0.058  0.16
## push_equ       0.54      0.55    0.51      0.29 1.2    0.020 0.047  0.23
## 
##  Item statistics 
##              n raw.r std.r r.cor r.drop mean  sd
## Cur_equ-  1553  0.60  0.63  0.45   0.29  2.5 1.1
## Gro_equ-  1556  0.66  0.70  0.55   0.42  2.2 1.0
## ideal_equ 1553  0.75  0.73  0.65   0.50  2.8 1.2
## push_equ  1555  0.73  0.69  0.60   0.43  2.8 1.3
## 
## Non missing response frequency for each item
##              1    2    3    4    5 miss
## Cur_equ   0.05 0.11 0.44 0.13 0.27 0.05
## Gro_equ   0.01 0.06 0.39 0.20 0.33 0.05
## ideal_equ 0.15 0.25 0.36 0.14 0.09 0.05
## push_equ  0.17 0.29 0.23 0.16 0.15 0.05
# second alpha check RWA group
psych::alpha(rwa_df, na.rm = TRUE, check.keys=TRUE) #Run the alpha calculation
## Warning in psych::alpha(rwa_df, na.rm = TRUE, check.keys = TRUE): Some items were negatively correlated with total scale and were automatically reversed.
##  This is indicated by a negative sign for the variable name.
## 
## Reliability analysis   
## Call: psych::alpha(x = rwa_df, na.rm = TRUE, check.keys = TRUE)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.72      0.72    0.73      0.34 2.6 0.011  3.2 0.82     0.26
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.7  0.72  0.74
## Duhachek   0.7  0.72  0.74
## 
##  Reliability if an item is dropped:
##              raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Open_minded-      0.75      0.74    0.72      0.41 2.8    0.010 0.040  0.42
## free_think-       0.71      0.70    0.69      0.37 2.4    0.012 0.059  0.37
## auth_control      0.64      0.64    0.64      0.31 1.8    0.015 0.033  0.24
## str_lead          0.64      0.64    0.63      0.31 1.8    0.015 0.027  0.26
## tradition         0.63      0.63    0.64      0.30 1.7    0.015 0.044  0.21
## 
##  Item statistics 
##                 n raw.r std.r r.cor r.drop mean  sd
## Open_minded- 1556  0.55  0.56  0.39   0.29  3.7 1.2
## free_think-  1553  0.61  0.63  0.49   0.40  3.6 1.1
## auth_control 1553  0.75  0.74  0.68   0.57  3.1 1.2
## str_lead     1548  0.76  0.74  0.69   0.57  2.9 1.3
## tradition    1554  0.77  0.76  0.69   0.60  2.7 1.2
## 
## Non missing response frequency for each item
##                 1    2    3    4    5 miss
## Open_minded  0.30 0.37 0.18 0.09 0.07 0.05
## free_think   0.19 0.38 0.27 0.10 0.05 0.05
## auth_control 0.09 0.23 0.30 0.21 0.17 0.05
## str_lead     0.16 0.24 0.30 0.14 0.16 0.05
## tradition    0.17 0.32 0.26 0.16 0.10 0.05

Factor Analysis

mergin the data together

survey_df <- data.frame(anes$C5_T1, anes$C5_T2, anes$C5_T3, anes$C5_T4, anes$C5_U1, anes$C5_U2, anes$C5_U3, anes$C5_U4, anes$C5_U5)

# Assigning new names for easier readability
Surv_names <- c("Cur_equ", "Gro_equ", "ideal_equ", "push_equ", "Open_minded", "free_think", "auth_control", "str_lead", "tradition")
# changing column names
colnames(survey_df) <- Surv_names
# searching for missing data | showing values as low as -7.
survey_df[survey_df <= -1] <- NA
skim(survey_df)
Data summary
Name survey_df
Number of rows 1635
Number of columns 9
_______________________
Column type frequency:
numeric 9
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Cur_equ 82 0.95 3.46 1.14 1 3 3 5 5 ▁▂▇▂▅
Gro_equ 79 0.95 3.79 1.01 1 3 4 5 5 ▁▁▇▅▇
ideal_equ 82 0.95 2.77 1.16 1 2 3 3 5 ▃▆▇▃▂
push_equ 80 0.95 2.82 1.30 1 2 3 4 5 ▅▇▆▅▃
Open_minded 79 0.95 2.26 1.17 1 1 2 3 5 ▆▇▃▂▂
free_think 82 0.95 2.43 1.06 1 2 2 3 5 ▃▇▆▂▁
auth_control 82 0.95 3.14 1.20 1 2 3 4 5 ▂▆▇▆▅
str_lead 87 0.95 2.90 1.27 1 2 3 4 5 ▅▆▇▃▅
tradition 81 0.95 2.69 1.20 1 2 3 4 5 ▅▇▆▃▂

Looking at the correlation below I can see 2 factors so far with (ideal_equ, push_equ, open-minded, and free-think all being positive towards each other and cur_equ, gro-equ, auth_control, str_lead, and tradition seem to be positive to each other and negative to the other group)

# step 1 correlation
cor_matrix<-cor(survey_df, use = "pairwise.complete.obs") #Saves correlation matrix
corrplot(cor_matrix, method = "circle") # Plot correlation matrix as a heatmap

cor_matrix
##                  Cur_equ     Gro_equ   ideal_equ    push_equ Open_minded
## Cur_equ       1.00000000  0.53235968 -0.11161483 -0.07962159 -0.11767553
## Gro_equ       0.53235968  1.00000000 -0.23112508 -0.16475508 -0.08879902
## ideal_equ    -0.11161483 -0.23112508  1.00000000  0.66428036  0.14381370
## push_equ     -0.07962159 -0.16475508  0.66428036  1.00000000  0.12539220
## Open_minded  -0.11767553 -0.08879902  0.14381370  0.12539220  1.00000000
## free_think   -0.16573160 -0.14469435  0.24794998  0.24654698  0.47349345
## auth_control  0.46432351  0.33911978 -0.06344072 -0.05790123 -0.12543316
## str_lead      0.43537169  0.30365579 -0.09340843 -0.08687152 -0.13724358
## tradition     0.40473010  0.29184744 -0.18738288 -0.21404840 -0.19062737
##              free_think auth_control    str_lead  tradition
## Cur_equ      -0.1657316   0.46432351  0.43537169  0.4047301
## Gro_equ      -0.1446944   0.33911978  0.30365579  0.2918474
## ideal_equ     0.2479500  -0.06344072 -0.09340843 -0.1873829
## push_equ      0.2465470  -0.05790123 -0.08687152 -0.2140484
## Open_minded   0.4734934  -0.12543316 -0.13724358 -0.1906274
## free_think    1.0000000  -0.22871210 -0.18540813 -0.2892724
## auth_control -0.2287121   1.00000000  0.63747253  0.5417001
## str_lead     -0.1854081   0.63747253  1.00000000  0.5886413
## tradition    -0.2892724   0.54170008  0.58864126  1.0000000

5 factors is when it starts to flatten out however, that is already below the 1 line so I would say 4 is more accurate however, I will keep 5 in the back of my mind for now.

# step 2 screeplot 

scree(survey_df)

I am going to go with PCF approach and varimax rotation since this is explorative analysis.

## factor analysis part

pcf_result <- principal(survey_df,nfactors = 4,  rotate = "varimax") 
pcf_result
## Principal Components Analysis
## Call: principal(r = survey_df, nfactors = 4, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                RC1   RC2   RC4   RC3   h2   u2 com
## Cur_equ       0.39  0.01  0.76 -0.08 0.74 0.26 1.5
## Gro_equ       0.14 -0.16  0.89 -0.04 0.83 0.17 1.1
## ideal_equ    -0.03  0.89 -0.13  0.11 0.82 0.18 1.1
## push_equ     -0.08  0.90 -0.02  0.09 0.83 0.17 1.0
## Open_minded  -0.05  0.01 -0.05  0.88 0.78 0.22 1.0
## free_think   -0.18  0.20 -0.05  0.80 0.71 0.29 1.2
## auth_control  0.81  0.03  0.25 -0.08 0.73 0.27 1.2
## str_lead      0.86 -0.02  0.17 -0.05 0.77 0.23 1.1
## tradition     0.80 -0.18  0.12 -0.16 0.71 0.29 1.2
## 
##                        RC1  RC2  RC4  RC3
## SS loadings           2.24 1.71 1.50 1.48
## Proportion Var        0.25 0.19 0.17 0.16
## Cumulative Var        0.25 0.44 0.61 0.77
## Proportion Explained  0.32 0.25 0.22 0.21
## Cumulative Proportion 0.32 0.57 0.79 1.00
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 4 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.07 
##  with the empirical chi square  625.62  with prob <  6.9e-132 
## 
## Fit based upon off diagonal values = 0.95
# when this is ran with 5 factors it only splits open minded and free thinking which I believe go togethe so I am going to stick with 4.
fa.diagram(pcf_result)

Creating the variables by concept

# Authoritarian 
anes <- anes %>%
  mutate(Authoritarian = (anes$auth_control_belief + anes$str_lead_belief + anes$tradition_belief) / 3) 

summary(anes$Authoritarian)  
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   2.333   3.000   3.089   4.000   5.000      90
anes %>% 
  count(Authoritarian)
## # A tibble: 14 x 2
##    Authoritarian     n
##            <dbl> <int>
##  1          1       72
##  2          1.33    60
##  3          1.67    89
##  4          2       97
##  5          2.33   113
##  6          2.67   130
##  7          3      265
##  8          3.33   163
##  9          3.67   162
## 10          4      139
## 11          4.33   102
## 12          4.67    81
## 13          5       72
## 14         NA       90
df_auth <- data.frame(anes$auth_control_belief, anes$str_lead_belief, anes$tradition_belief, anes$Authoritarian)

head(df_auth)
##   anes.auth_control_belief anes.str_lead_belief anes.tradition_belief
## 1                        2                    2                     2
## 2                        3                    1                     2
## 3                        2                    3                     4
## 4                        4                    5                     3
## 5                        5                    5                     5
## 6                        3                    1                     3
##   anes.Authoritarian
## 1           2.000000
## 2           2.000000
## 3           3.000000
## 4           4.000000
## 5           5.000000
## 6           2.333333
# Socalist equality
anes <- anes %>%
  mutate(Socialist = (anes$ideal_equ + anes$push_equ) / 2) 

summary(anes$Socialist) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   2.000   3.000   2.796   3.500   5.000      84
anes %>% 
  count(Socialist)
## # A tibble: 10 x 2
##    Socialist     n
##        <dbl> <int>
##  1       1     158
##  2       1.5   106
##  3       2     278
##  4       2.5   221
##  5       3     331
##  6       3.5   120
##  7       4     155
##  8       4.5    65
##  9       5     117
## 10      NA      84
df_social <- data.frame(anes$ideal_equ, anes$push_equ, anes$Socialist)

head(df_social)
##   anes.ideal_equ anes.push_equ anes.Socialist
## 1              3             2            2.5
## 2              2             3            2.5
## 3              3             4            3.5
## 4              1             2            1.5
## 5              5             5            5.0
## 6              4             5            4.5
# current gov
anes <- anes %>%
  mutate(current_gov = (anes$Cur_equ + anes$gro_equ) / 2) 
summary(anes$current_gov) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.500   2.500   2.379   3.000   5.000      84
anes %>% 
  count(current_gov)
## # A tibble: 10 x 2
##    current_gov     n
##          <dbl> <int>
##  1         1     330
##  2         1.5   112
##  3         2     201
##  4         2.5   189
##  5         3     513
##  6         3.5   122
##  7         4      64
##  8         4.5     9
##  9         5      11
## 10        NA      84
df_curr <- data.frame(anes$Cur_equ, anes$gro_equ, anes$current_gov)

head(df_curr)
##   anes.Cur_equ anes.gro_equ anes.current_gov
## 1            2            1              1.5
## 2            4            3              3.5
## 3            3            3              3.0
## 4            3            1              2.0
## 5            4            3              3.5
## 6            2            3              2.5
# democracy
anes <- anes %>%
  mutate(Democrat = (anes$open_minded_belief + anes$free_belief) / 2) 

summary(anes$Democrat) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.500   2.000   2.348   3.000   5.000      83
anes %>% 
  count(Democrat)
## # A tibble: 10 x 2
##    Democrat     n
##       <dbl> <int>
##  1      1     201
##  2      1.5   239
##  3      2     373
##  4      2.5   224
##  5      3     290
##  6      3.5    87
##  7      4      72
##  8      4.5    29
##  9      5      37
## 10     NA      83
df_demo <- data.frame(anes$open_minded_belief, anes$free_belief, anes$Democrat)

head(df_demo)
##   anes.open_minded_belief anes.free_belief anes.Democrat
## 1                       2                2           2.0
## 2                       3                3           3.0
## 3                       1                2           1.5
## 4                       2                3           2.5
## 5                       5                1           3.0
## 6                       5                5           5.0

Answers

Question 1

Test Cronbach’s alpha for both of the scales individually (i.e. you should report 2 separate alphas; 1 for each unique scale). Report them here and draw a conclusion on how well the items fit the underlying latent concept. a. Are there any items that fit the concept better than others? Make sure to mention for each scale if any items should be removed to make the scale more reliable.

SDO - showed a raw alpha of .815 which shows a strong reliability. When looking at the items the only one I would immediately remove is the open_minded_belief variable. Removing this one would cause the raw alpha to go to .836. There is one that drops the alpha by .001 so I would probably remove this one too since the impact is extremely small. Thus, I would remove open_minded_belief(U1), and creative belief (U2). This means the two open minded thinking questions do not fit this concept as well as the other 3.

RWA- Showed a raw alpha of .775 which shows a strong reliability. When looking at the variables one really stood out it was Cur_equ(T1). Removing this variable raised the raw alpha to .795 which means removing it would be benefical. There were a few that were close what was surprising here was that the other questions that was similar to T1, race_equ(T2) actually made the alpha drop when removed. The equality variable made at the end fit the concept the best as removing it would cause it to drop to .625.

Question 2

Perform an exploratory factor analysis on all items extracting 2 factors to begin (i.e. run one large factor analysis that combines items from all three scales). After evaluating these results, if you think you need to extract a different number factors do so.

  1. What is the appropriate number of factors to best describe the items?
  2. Be sure to explain why you selected the number of factors you did.
  • 4! I select 4 because it was above the 1 line however, it did not flatten out yet. I debated on 5 as the line was still not flat however, it went below the line so I landed on 4.
  1. What type of factor analysis and rotation, if any, did you use?
  • I went with the PCF analysis and I did varimax (orthogonal rotation)

Question 3

Present your final factor analysis results including the eigenvalues and factor loadings & draw conclusions on each scale

  1. What are the eigenvalues for each factor?
  • auth_control - .81
  • str_lead - .86
  • tradition - .80
  • ideal_equ - .89
  • push_equ - .90
  • cur_equ - .76
  • Gro_equ - .89
  • open_minded - .88
  • free_think - .80
  1. Which variables combine to make which latent concepts?
  • RC1 -
  • auth_control - .81
  • str_lead - .86
  • tradition - .80 RC2
  • ideal_equ - .89
  • push_equ - .90 RC3
  • cur_equ - .76
  • Gro_equ - .89 RC4
  • open_minded - .88
  • free_think - .80
  1. Give the latent concepts an informative name so other’s know what it measures RC1 - Authoritarian RC2 - Socialist equality RC3 - Current government state RC4 - Democratic

Question 4

Draw conclusions on which items are measuring the same underlying latent concept and which ones are not.

  1. Make a recommendation on if Social Dominance Orientation and Right Wing Authoritarianism is measuring the same or different underlying concepts.

With the 4 factors that appeared to be the best it seems they touch an overarching theme however, they did not end up being sorted together as each group only had items from one side. So I would say they measured different underlying concepts such as different forms of government however, they touched the same theme which was government or political beliefs.

  1. Within each scale, is there only one dimension or best described as multi-dimensional?

I would say multi-dimensional as there are many factors that could go into these forms of government. For example, there are many things that could go into socialist equality besides just equality.

Question 5

Create a new variable which creates the latent concept from the individual survey items for each of your recommended scales

  1. i.e. actually create the new combined scale from the individual survey questions as done in the alpha tutorial
  • The last pieces of code are the creation of each of those variabels seperated out.