Factor Analysis - Polychoric

Faith survey

Author

Gonzalo Talavera

Published

March 12, 2024

1 Data preparation

1.1 Original data

as.data.frame(question_wording)
                                                                                                                                question_wording
q01                                                                                                                            What is your age?
q02                                                                                                                         What is your gender?
q03                                                                                                            What is your geographic location?
q04                                                                                           What role do you have in your religious community?
q05                                                                                                          What is your religious affiliation?
q06                             Does your religious leader's opinion of the COVID-19 vaccine affect whether or not you will receive the vaccine?
q07                                            Do you think most religious leaders in your community agree with COVID-19 vaccination for adults?
q08                                                                                                      Have you received COVID-19 vaccination?
q09                                                                                      Would you get the COVID-19 vaccine if it was available?
q10                                    What is the top reason that you have not received the vaccine or are not sure about the COVID-19 vaccine?
q11                                Which of the following people will influence you or has influenced you the most to take the COVID-19 vaccine?
q12                                       Which of the following factors would influence or did influence you the most to take COVID-19 vaccine?
q13                                                                                                Do you have a child who is under 5 years old?
q14                        How important do you think it is for your child to get regular vaccination, for example against measles, polio, etc.?
q15                                                  Who is the most influential in your decision related to regular vaccination for your child?
q16                                                     Did you continue taking your child for regular vaccination during the COVID-19 pandemic?
q17                                         Do you think that most religious leaders in your community support regular vaccination for children?
q18                                                   How does your religious belief influence your decision about gettng your child vaccinated?
q04simplified                                                                                                                  Role (Simplified)
q05simplified                                                                                                 Religious affiliation (Simplified)
q14simplified How important do you think it is for your child to get regular vaccination, for example against measles, polio, etc.? (Simplified)
q15simplified                           Who is the most influential in your decision related to regular vaccination for your child? (Simplified)
q16simplified                               Did you continue taking your child for regular vaccination during the COVID-19 pandemic (Simplified)
q17simplified                  Do you think that most religious leaders in your community support regular vaccination for children? (Simplified)

1.2 Simplification of levels

# Collapse levels of q06 for not sure and prefer not to answer
simplified_data <- lapply(list_data, function(country_data) {
    country_data$q06simplified <- fct_collapse(
        country_data$q06, 
        "Not sure / prefer not to answer" = c("not sure", "prefer not to answer"))
    country_data$q06simplified <- fct_recode(
        country_data$q06simplified, 
        "Yes, it matters" = "yes, it matters what my religious leaders believe about the covid-19 vaccine and if they support it",
        "No, it doesn't matter" = "no, it does not matter to me what my religious leaders believe about the covid-19 vaccine")
    country_data$q06simplified <- fct_relevel(
        country_data$q06simplified, 
        "No, it doesn't matter", 
        "Not sure / prefer not to answer",
        "Yes, it matters")
    
    # Collapse levels of q07
    # country_data$q07 <- fct_collapse(
    #     country_data$q07,
    #     "No/not sure" = c("not sure", "no"))
    
    country_data$q04simplified <- fct_collapse(
        country_data$q04,
        "Member of a religious community" = c("religious leader",
                                              "member of religious community"))
        
    # country_data$q05simplified <- fct_collapse(
    #     country_data$q05,
    #     "Other religions" = c(
    #         "sikh",
    #                           "other",
    #                           "jewish",
    #                           "baha'i",
    #                           "hindu",
    #                           "traditional religions",
    #                           "not religious",
    #                           "buddhist"))
    
    # country_data$q05simplified <- fct_relevel(
    #     country_data$q05simplified,
    #     c(
    #     "muslim",
    #     "christian",
    #     "Other religions"))
    
    # country_data$q14simplified <- fct_collapse(
    #     country_data$q14,
    #     "Somehow important / not sure" = c("somehow important", "not sure")
    # )
    
    country_data$q14binary <- fct_collapse(
        country_data$q14,
        "Very / somehow important" = c("very important", "somehow important"),
        "Not at all important / not sure" = c("not sure", "not at all important")
    )
    
        country_data$q15simplified <- fct_collapse(
        country_data$q15,
        "Family/friends/other" = c("family member", "peer/friend", "other"),
    )   
    country_data$q15simplified <- fct_relevel(
        country_data$q15simplified,
        c("religious leader", "health worker"), after = Inf
    )
    
    country_data$q16simplified <- fct_collapse(
        country_data$q16,
        "No/not sure" = c("no", "not sure")
    )   
 
    country_data$q17simplified <- fct_collapse(
        country_data$q17,
        "No/not sure" = c("no", "not sure"),
        "Only some leaders" = c("only some religious leaders support vaccination for children")
    )   
    # country_data$q17simplified <- fct_rev(
    #     country_data$q17simplified
    # )
    
    country_data$q18simplified <- fct_collapse(
        country_data$q18,
        "does not affect/not sure" = c("does not affect decision for child vaccination",
                                       "not sure")
    )   
    
    return(country_data)
})

# Wrap levels
wrapped_data <- lapply(simplified_data, function(country_data) {
    levels(country_data$q04) <- str_wrap(levels(country_data$q04), width = 26)
    levels(country_data$q06) <- str_wrap(levels(country_data$q06), width = 20)
    return(country_data)
})

#bind rows in wrapped data adding a column for country using each df name
all_data <- bind_rows(wrapped_data) %>%
    mutate(country = rep(names(wrapped_data), sapply(wrapped_data, nrow))) |> 
    mutate(country = factor(country))

1.3 Variable recoding

Selecting the variables to be used in the regression analysis, and encoding them as numeric. Country not included.

# Factor Analysis data
fa_data <- tibble(
    # Outcome vars
    has_covidVax = case_when(                                           # q08
        all_data$q08 == "yes" ~ 1,
        all_data$q08 == "no" ~ 0,
        TRUE ~ NA_integer_
        ),
    continued_regVax = case_when(                                        # q16
        all_data$q16simplified == "yes" ~ 1,
        all_data$q16simplified == "No/not sure" ~ 0,
        TRUE ~ NA_integer_
        ),
    is_rfp = case_when(                                         # contact_type
        all_data$contact_type == "Religions For Peace" ~ 1,
        all_data$contact_type == "RDD / 3-2-1" ~ 0,
        TRUE ~ NA_integer_
    ),
    is_member_or_leader = case_when(                                    # q04
        all_data$q04 == "member of religious\ncommunity" ~ 1,
        all_data$q04 == "religious leader" ~ 1,
        all_data$q04 == "not part of any religious\ncommunity" ~0,
        TRUE ~ NA_integer_
    ),
    # Indep vars
    age_group = as.integer(all_data$q01),                               # q01
    is_female = case_when(                                              # q02
        all_data$q02 == "female" ~ 1,
        all_data$q02 == "male" ~ 0,
        TRUE ~ NA_integer_  # Dropping gender = "prefer not to answer"
        ),
    is_rural = case_when(                                               # q03
        all_data$q03 == "rural" ~ 1,
        all_data$q03 == "urban" ~ 0,
        TRUE ~ NA_integer_
        ),
    leaders_endorse_covidVax = case_when(                               # q07
        all_data$q07 == "yes" ~ 2,
        all_data$q07 == "not sure" ~ 1,
        all_data$q07 == "no" ~ 0,
        TRUE ~ NA_integer_
        ),
    would_receive_covidVax = case_when(                                 # q09
        all_data$q09 == "yes" ~ 2,
        all_data$q09 == "not sure" ~ 1,
        all_data$q09 == "no" ~ 0,
        TRUE ~ NA_integer_
        ),
    top_reason_noVax_religious = case_when(                           # q10
        all_data$q10 %in% c(
            "i believe god/allah/adonai/gods and goddesses/higher power will protect me from covid-19",
            "goes against my religious belief") ~ 1,
        is.na(all_data$q10) ~ NA_integer_,
        TRUE ~ 0
            ),
    top_reason_noVax_riskIsLow = case_when(                             # q10
        all_data$q10 %in% c(
            "i do not feel at risk of catching covid-19",
            "it's not that serious. if i catch covid-19 i can recover without vaccine") ~ 1,
        is.na(all_data$q10) ~ NA_integer_,
        TRUE ~ 0
        ),
    top_reason_noVax_vaxIsBad = case_when(                              # q10
        all_data$q10 %in% c(
            "i am afraid of the potential side effects of the covid-19 vaccine",
            "i am not confident that the vaccine will protect me") ~ 1,
        is.na(all_data$q10) ~ NA_integer_,
        TRUE ~ 0
        ),
    covidVaxInfluencer_religiousLeader = case_when(                    # q11
        all_data$q11 == "religious leader" ~ 1,
        is.na(all_data$q11) ~ NA_integer_,
        TRUE ~ 0
        ),
    covidVaxInfluencer_healthWorker = case_when(                       # q11
        all_data$q11 == "health worker" ~ 1,
        is.na(all_data$q11) ~ NA_integer_,
        TRUE ~ 0
        ),
    covidFactorInDecision_religious = case_when(                        # q12
        all_data$q12 %in% c(
            "seeing religious leaders lead by example through taking the vaccine publicly",
            "public statements by influential persons") ~ 1,
        is.na(all_data$q12) ~ NA_integer_,
        TRUE ~ 0
        ),
    has_children_under_5 = case_when(                                   # q13
        all_data$q13 == "yes" ~ 1,
        all_data$q13 == "no" ~ 0,
        TRUE ~ NA_integer_
        ),
    regVax_importance =  as.integer(                                   # q14
        fct_relevel(all_data$q14, "not at all important", "not sure", "somehow important", "very important")
        ),
    regVaxInfluencer_religiousLeader = case_when(                       # q15
        all_data$q15simplified == "religious leader" ~ 1,
        is.na(all_data$q15simplified) ~ NA_integer_,
        TRUE ~ 0
        ),
    regVaxInfluencer_healthWorker = case_when(                          # q15
        all_data$q15simplified == "health worker" ~ 1,
        is.na(all_data$q15simplified) ~ NA_integer_,
        TRUE ~ 0
        ),
    leaders_endorse_regVax = as.integer(                                # q17
        all_data$q17
        ),
    religious_belief_encourages_regVax = case_when(                      # q18
        all_data$q18 == "encourages child vaccination" ~ 1,
        is.na(all_data$q18) ~ NA_integer_,
        TRUE ~ 0
        )
    ) 
head(fa_data)
# A tibble: 6 × 21
  has_covidVax continued_regVax is_rfp is_member_or_leader age_group is_female
         <dbl>            <dbl>  <dbl>               <dbl>     <int>     <dbl>
1            1               NA      0                   0         4         1
2            1                1      0                   1         3         1
3            1               NA      0                   0         4         0
4            1                1      0                   1         4         0
5            1               NA      0                   1         4         0
6            1                1      0                   1         4         0
# ℹ 15 more variables: is_rural <dbl>, leaders_endorse_covidVax <dbl>,
#   would_receive_covidVax <dbl>, top_reason_noVax_religious <dbl>,
#   top_reason_noVax_riskIsLow <dbl>, top_reason_noVax_vaxIsBad <dbl>,
#   covidVaxInfluencer_religiousLeader <dbl>,
#   covidVaxInfluencer_healthWorker <dbl>,
#   covidFactorInDecision_religious <dbl>, has_children_under_5 <dbl>,
#   regVax_importance <int>, regVaxInfluencer_religiousLeader <dbl>, …

1.4 Summarize data

# Use describe() function from the psych package
summary_df <- psych::describe(fa_data)

# Selecting only mean, sd, range, and missing
summary_df[, c("n", "mean", "sd", "min", "max","range")]
                                       n mean   sd min max range
has_covidVax                       19846 0.76 0.43   0   1     1
continued_regVax                    9677 0.76 0.43   0   1     1
is_rfp                             19847 0.28 0.45   0   1     1
is_member_or_leader                19846 0.64 0.48   0   1     1
age_group                          19846 2.32 1.10   1   4     3
is_female                          18929 0.41 0.49   0   1     1
is_rural                           19846 0.46 0.50   0   1     1
leaders_endorse_covidVax           19846 1.50 0.73   0   2     2
would_receive_covidVax              4822 1.01 0.86   0   2     2
top_reason_noVax_religious          4816 0.14 0.35   0   1     1
top_reason_noVax_riskIsLow          4816 0.40 0.49   0   1     1
top_reason_noVax_vaxIsBad           4816 0.34 0.47   0   1     1
covidVaxInfluencer_religiousLeader 19846 0.08 0.27   0   1     1
covidVaxInfluencer_healthWorker    19846 0.38 0.49   0   1     1
covidFactorInDecision_religious    19846 0.27 0.45   0   1     1
has_children_under_5               19846 0.49 0.50   0   1     1
regVax_importance                   9684 3.26 1.05   1   4     3
regVaxInfluencer_religiousLeader    9679 0.07 0.25   0   1     1
regVaxInfluencer_healthWorker       9679 0.46 0.50   0   1     1
leaders_endorse_regVax              9642 3.19 1.04   1   4     3
religious_belief_encourages_regVax  9671 0.62 0.49   0   1     1

1.5 Creating candidate datasets for factor analysis

regVax_only_df <- fa_data %>%
    select(
        -(would_receive_covidVax:top_reason_noVax_vaxIsBad),
        -has_children_under_5,
        ) %>%
    filter(!is.na(regVax_importance))

covidVax_df <- fa_data %>%
    select(
        -continued_regVax,
        -(regVax_importance:religious_belief_encourages_regVax),
        -(top_reason_noVax_religious:top_reason_noVax_vaxIsBad)
        )

possible_datasets <- list(
  regVax_only_df = regVax_only_df,
  covidVax_df = covidVax_df
)

1.6 Check for sparse categories

covidVaxInfluencer_religiousLeader (mean 0.0693253) and regVaxInfluencer_religiousLeader (mean 0.0780006) have few observations in some categories, but in any case, more than 5%.

1.7 Remove Zero Cross-tab Cells

Influence of religious leaders and health workers in the decision to vaccinate are mutually exclusive, so they yield zero cells in the cross-tabulation. Therefore, we will remove those variables from the analysis.

regVax_no_zero <- regVax_only_df %>%
    select(
        -covidVaxInfluencer_religiousLeader,
        -regVaxInfluencer_religiousLeader,
        -has_covidVax, 
        -continued_regVax
    )

covidVax_no_zero <- covidVax_df %>%
    select(
        -covidVaxInfluencer_religiousLeader,
        -has_covidVax,
        -has_children_under_5
    )

1.8 Convert some variables to numeric

regVax_no_zero_num <- regVax_no_zero %>% 
    mutate(
        age_group = as.numeric(age_group),
        regVax_importance =as.numeric(regVax_importance),
        leaders_endorse_regVax = as.numeric(leaders_endorse_regVax)) 

covidVax_no_zero_num <- covidVax_no_zero %>% 
    mutate(
        age_group = as.numeric(age_group),
        leaders_endorse_covidVax = as.numeric(leaders_endorse_covidVax)) 

1.9 Use only complete cases

regVax_no_zero_complete <- regVax_no_zero_num %>%
    filter(
        complete.cases(.)
    )

covidVax_no_zero_complete <- covidVax_no_zero_num %>%
    filter(
        complete.cases(.)
    )

1.10 Assign resulting dataframes to variables

regVax <- regVax_no_zero_complete
covidVax <- covidVax_no_zero_complete

2 Exploratory factor analysis

2.1 Regular vaccination

2.1.1 Polychoric correlation

poly_cor <- polychoric(regVax)
Warning in polychoric(regVax): The items do not have an equal number of
response alternatives, global set to FALSE.
rho <- poly_cor$rho
### Thresholds/Scaling results
# poly_cor$tau

2.1.2 Plot the correlation matrix

cor.plot(poly_cor$rho, numbers=T, upper=FALSE, main = "Polychoric Correlation", show.legend = FALSE)

2.1.3 Scree plot

The scree plot comparing to random data suggests a 4-factor solution

# Scree plot
fa.parallel(rho, fm="pa", fa="fa", main = "Scree Plot")
Warning in fa.parallel(rho, fm = "pa", fa = "fa", main = "Scree Plot"): It
seems as if you are using a correlation matrix, but have not specified the
number of cases. The number of subjects is arbitrarily set to be 100

Parallel analysis suggests that the number of factors =  4  and the number of components =  NA 

2.1.4 Factor analysis

  • ML3 emerges as a factor apparently related to “adherence to religious beliefs”, as leaders’ endorsement of regular vaccination and religious beliefs encouraging vaccination are the most important variables in this factor. More over, leaders’ endorsement of COVID vaccination has also a modearte loading into it.
  • ML1 seems to be related to the influence of health workers.
  • ML2 has a unique variable loading into it wich is the membership or not to Religions for Peace (RFP). Reported membership to a religious community (is_member_or_leader) does not load into this factor.
  • ML4 seems to relfect a more complex dimension, as the importance of regular vaccination has a positive loading, but the religous factor (those considering that seeing the religious leader getting the vaccine) is negatively loaded. Perhaps this factor groupsa respondents that place an importance to the vaccine but are not influenced by religious leaders.
# Polychoric factor analysis
poly_model = fa(regVax, nfactor=4, cor="poly", fm="mle", rotate = "varimax")
Warning in polychoric(r, correct = correct, weight = weight): The items do not
have an equal number of response alternatives, global set to FALSE.
poly_model$loadings

Loadings:
                                   ML3    ML1    ML2    ML4   
is_rfp                              0.289         0.947       
is_member_or_leader                 0.331  0.119  0.279  0.275
age_group                                         0.169 -0.215
is_female                          -0.199 -0.113              
is_rural                                         -0.219       
leaders_endorse_covidVax            0.519  0.146              
covidVaxInfluencer_healthWorker     0.261  0.538         0.106
covidFactorInDecision_religious     0.451         0.114  0.476
regVax_importance                                 0.110 -0.588
regVaxInfluencer_healthWorker       0.159  0.984              
leaders_endorse_regVax              0.718                     
religious_belief_encourages_regVax  0.786  0.148              

                 ML3   ML1   ML2   ML4
SS loadings    1.939 1.350 1.089 0.724
Proportion Var 0.162 0.112 0.091 0.060
Cumulative Var 0.162 0.274 0.365 0.425

2.2 COVID-19 vaccination

2.2.1 Polychoric correlation

poly_cor <- polychoric(covidVax)
Warning in polychoric(covidVax): The items do not have an equal number of
response alternatives, global set to FALSE.
rho <- poly_cor$rho
### Thresholds/Scaling results
# poly_cor$tau

2.2.2 Plot the correlation matrix

cor.plot(poly_cor$rho, numbers=T, upper=FALSE, main = "Polychoric Correlation", show.legend = FALSE)

2.2.3 Scree plot

The scree plot comparing to random data suggests a 2-factor solution

# Scree plot
fa.parallel(rho, fm="pa", fa="fa", main = "Scree Plot")
Warning in fa.parallel(rho, fm = "pa", fa = "fa", main = "Scree Plot"): It
seems as if you are using a correlation matrix, but have not specified the
number of cases. The number of subjects is arbitrarily set to be 100

Parallel analysis suggests that the number of factors =  3  and the number of components =  NA 

2.2.4 Factor analysis

The model for COVID-19, which excludes questions made only tho those having children under 5 (related to regular vaccination), explains only 0.23 of the variance. The scree plot suggests a 2-factor solution.

In this case, the factors are less clear than in the regular vaccination model. - ML1 is highly associated to being from RFP or not, and moderately associated to declaring belonging to a religious community. - ML2 is associated to external factors (the influence of health workers and leaders), and personal component on the willingness to get vaccinated.

# Polychoric factor analysis
poly_model = fa(covidVax, nfactor=2, cor="poly", fm="mle", rotate = "varimax")
Warning in polychoric(r, correct = correct, weight = weight): The items do not
have an equal number of response alternatives, global set to FALSE.
poly_model$loadings

Loadings:
                                ML1    ML2   
is_rfp                           0.972  0.225
is_member_or_leader              0.406  0.244
age_group                        0.245 -0.135
is_female                              -0.191
is_rural                        -0.261       
leaders_endorse_covidVax                0.373
would_receive_covidVax                  0.474
covidVaxInfluencer_healthWorker  0.164  0.489
covidFactorInDecision_religious         0.467

                 ML1   ML2
SS loadings    1.277 0.987
Proportion Var 0.142 0.110
Cumulative Var 0.142 0.252

3 Concluding remarks

3.1 Regular vaccination

  • ML3 emerges as a factor apparently related to “adherence to religious beliefs”, as leaders’ endorsement of regular vaccination and religious beliefs encouraging vaccination are the most important variables in this factor. More over, leaders’ endorsement of COVID vaccination has also a modearte loading into it.
  • ML1 seems to be related to the influence of health workers.
  • ML2 has a unique variable loading into it wich is the membership or not to Religions for Peace (RFP). Reported membership to a religious community (is_member_or_leader) does not load into this factor.
  • ML4 seems to relfect a more complex dimension, as the importance of regular vaccination has a positive loading, but the religous factor (those considering that seeing the religious leader getting the vaccine) is negatively loaded. Perhaps this factor groupsa respondents that place an importance to the vaccine but are not influenced by religious leaders.

3.2 COVID-19 vaccination

In this case, the factors are less clear than in the regular vaccination model.

  • ML1 is highly associated to being from RFP or not, and moderately associated to declaring belonging to a religious community.
  • ML2 is associated to external factors (the influence of health workers and leaders), and personal component on the willingness to get vaccinated.

4 References

  • Wu, Jiayu (2018). https://alice86.github.io/2018/04/08/Factor-Analysis-on-Ordinal-Data-example-in-R-(psych,-homals)/