Week 1_corr_revised

This section provides a simple function for formatting a correlation matrix into a table with 5 columns containing :

Column 1 : variable names

Column 2 : the correlation coefficients between each variable with GG_Level

Column 3 : the p-values of the correlations

Column 4 : for p_values < 0.05 level of significance, we mark it as significant, otherwise as non-significant.

Column 5: for correlation coefficients having directions as we expected, we mark them as match, otherwise as mismatch.

corre<-read.csv("~/Downloads/GG_identification_Nov_2018__Muslim_Americans_Correlational - 20190716.csv") %>%
  dplyr::select(GG_Level:Gender)

#A simple function to format the correlation matrix
#The custom function below can be used :

# ++++++++++++++++++++++++++++
# flattenCorrMatrix
# ++++++++++++++++++++++++++++
# cormat : matrix of the correlation coefficients
# pmat : matrix of the correlation p-values
flattenCorrMatrix <- function(cormat, pmat) {
  ut <- upper.tri(cormat)
  data.frame(
    row = rownames(cormat)[row(cormat)[ut]],
    column = rownames(cormat)[col(cormat)[ut]],
    cor  =(cormat)[ut],
    p = pmat[ut]
    )
}
res2<-rcorr(as.matrix(corre))
a<-flattenCorrMatrix(res2$r, res2$P) %>%
  filter(row=="GG_Level")%>%
  select(-row)%>%
  rename(Variable=column, p_val=p)


a$imp<-ifelse(a$p_val<0.05, "sig", "non-sig")
a$match<-ifelse(a$Variable=="Despair", "mismatch", "match")

a

##                         Variable         cor        p_val     imp    match
## 1        Simon_Dual_identity_SCL  0.61911485 0.000000e+00     sig    match
## 2                      Angst_SCL -0.46976853 7.929897e-10     sig    match
## 3                 Stereotype_SCL -0.54019036 5.742073e-13     sig    match
## 4                Symb_Threat_SCL -0.65888547 0.000000e+00     sig    match
## 5                Real_Threat_SCL -0.48918273 1.607163e-10     sig    match
## 6                            ITT -0.62109916 0.000000e+00     sig    match
## 7              GG_Dehumanization -0.62698388 0.000000e+00     sig    match
## 8       Peception_of_GC_mediator  0.20009222 1.376623e-02     sig    match
## 9         Peception_of_GC_bridge  0.29076239 2.925273e-04     sig    match
## 10  Peception_of_GC_fifth_column -0.38659598 9.433749e-07     sig    match
## 11      Peception_of_GC_traitors -0.37068802 2.796120e-06     sig    match
## 12        Peception_of_GC_unique  0.34837152 1.168411e-05     sig    match
## 13          Peception_of_GC_weak -0.33753882 2.251719e-05     sig    match
## 14          Peception_of_GC_lmlm  0.20406880 1.196051e-02     sig    match
## 15 Positive_perception_of_GC_SCL  0.32186687 5.575820e-05     sig    match
## 16 Negative_perception_of_GC_SCL -0.40295606 2.903061e-07     sig    match
## 17               Common_fate_SCL  0.49151756 1.693539e-10     sig    match
## 18                       Hatered -0.19867232 1.480067e-02     sig    match
## 19                          Fear -0.40369242 3.014173e-07     sig    match
## 20                         Anger -0.31599029 8.178291e-05     sig    match
## 21                          Hope  0.32416490 5.192030e-05     sig    match
## 22                       Empathy  0.47958975 5.321639e-10     sig    match
## 23                       Despair  0.12752215 1.199195e-01 non-sig mismatch
## 24                      CIIM_SCL  0.43824028 2.037978e-08     sig    match
## 25                       SDO_SCL -0.36770748 3.673842e-06     sig    match
## 26            Identification_SCL -0.18858331 2.082624e-02     sig    match
## 27              Essentialism_SCL -0.26991626 8.367225e-04     sig    match
## 28                    Policy_SCL -0.41882562 9.611181e-08     sig    match
## 29       Resource_allocation_SCL  0.44426094 1.234781e-08     sig    match
## 30                           Age -0.10428984 2.040612e-01 non-sig    match
## 31                        Gender -0.07514975 3.607239e-01 non-sig    match

In the following correlation, matched varibles are colored in green and mismatched variables are colored in red. We also put asterisks on each correlation bar to indicate the level of significance (p-value). Overall, only varible \(Despair\) has opposite correlation coefficient as we expected but its p-value is 1.199195e-01, which is greater than the 0.05 level of significance.

p<-a %>%ggplot(aes(x=Variable, y=cor, fill= match))+
    scale_fill_manual(values=c(mismatch="red", match="green"))+
    geom_bar(stat="identity")+
    #ylab("Correlation with GG_Level")+
    #xlab("Variable")+
    geom_hline(yintercept = 0, size=1)+
    coord_flip()

label.df<- a %>% dplyr::filter(a$p_val<0.05)
p+geom_text(data= label.df, label="*")

#x$color<-ifelse(x$Correlation_with_GG_Level >0, "positive", "negative")

#x %>%
  #mutate(rowname = factor(rowname, levels = rowname[order(GG_Level)]))%>% # Order by Correlation Strength
#  ggplot(aes(x= rowname, y=Correlation_with_GG_Level, fill=color))+
#    scale_fill_manual(values=c(positive="green",negative="red"))+
#    geom_bar(stat="identity")+
#    ylab("Correlation with GG_Level")+
#    xlab("Variable")+
#    geom_hline(yintercept = 0, size=1)+
#    coord_flip()

In the correlation plot, we marked positive correlation as green, which shows people who regard Muslim Americans as having a dual identity. While those negative correlations are marked in red, representing people view Muslim Americans have only one identity.

Then, the variables are sorted(highest on top) by the correlation value with GG_Level. This is an interactive plot, if we hover over a bar, it displays the specific variable name, correlation value between this variable and GG_Level and whether the correlation is positive or negative. Simon_Dual_identity_SCL ranked highest and Symb_Threat_SCL ranked lowest.

x<-corre %>%
  correlate() %>%
  focus(GG_Level) %>%
  rename(Correlation_with_GG_Level=GG_Level)

## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'

x$color<- ifelse(x$Correlation_with_GG_Level>0, "positive", "negative")
y<-x %>%
  mutate(rowname = 
           factor(rowname, levels =rowname[order(Correlation_with_GG_Level)]))%>% # Order by Correlation Strength
  ggplot(aes(x= rowname, y=Correlation_with_GG_Level, fill=color))+
    scale_fill_manual(values=c(positive="green",negative="red"))+
    geom_bar(stat="identity")+
    ylab("Correlation with GG_Level")+
    xlab("Variable")+
    geom_hline(yintercept = 0, size=1)+
    coord_flip()#+geom_text(aes(label=GG_Level))
ggplotly(y)

Finally, I am interested if there is any inter-correlation between partcipants’ age, gender and GG_Level. Before plotting the mosaic plot, we need to process the raw data.

According to data legend file, I renamed gender 1 to male and 2 to female.
Variable Age are classified to three groups, 19-35 as young adults, 35-55 as middle_aged adults and over 55 as elder adults.
Variable GG_Levels are classified into three level, -25-0 for participants who view Muslim Americans as having a single identity 0-25 for those having a neutal attitude, and 25-50 for those who view Muslim Americans as having a dual identity.

From the mosaic plot, we have some interesting findings:

There are in general more males in three age groups(young, middle-aged and elder) than females. For both genders, there is a higher proportion of people who are young than are elder for three age groups.
Elder people are more likely to view Muslim Americans as having a single identity while young adults usually view them as having a dual identity.

corree<-corre %>%mutate(Gender=recode(Gender, '1'="Male", '2'="Female"))%>%
  mutate(Age=cut(Age, breaks = c(-Inf, 18, 35, 55, Inf), 
                 labels = c("0", "Young(1-35)", "Middle_Aged(36-55)", "Elder(55+)"))) %>%
  mutate(Viewas_dual=cut(GG_Level, breaks = c(-25,0, 25, 50),
                         labels = c("single", "neutral", "dual")))
gg<-corree %>%
  na.omit() %>%
  ggplot() +
  geom_mosaic(aes(x=product(Viewas_dual,Age, Gender), fill = Gender))+
  theme(axis.text.x = element_text(angle = 10, hjust = 1))
ggplotly(gg)

Week 1_corr_revised

Weijia Bao

7/21/2019