DATA429/599: Survey Comparative Visualizations

Learning Objectives:

In this lesson students will learn to:

Use survey packages in R
Weight data based on auxiliary information
Create visualizations for survey data

Example: Religion Survey

These data come from the fivethirtyeight article “When Does Praying In Public Make Others Uncomfortable?” https://fivethirtyeight.com/features/when-does-praying-in-public-make-others-uncomfortable/

This survey was fielded from July 29 and August 1, 2016.

The original data can be accessed here: https://www.kaggle.com/datasets/tunguz/religion-survey/data.

0. The Data

### DATA ON GITHUB
rel<-read.csv("https://raw.githubusercontent.com/kitadasmalley/Teaching/refs/heads/main/DATA429_599/CODE/religion-survey-results.csv", 
              header=TRUE)

head(names(rel))

## [1] "What.is.your.present.religion..if.any."                                                                        
## [2] "X"                                                                                                             
## [3] "Do.you.consider.yourself.to.be.an.evangelical."                                                                
## [4] "Do.you.attend.religious.services"                                                                              
## [5] "How.often.do.you..Pray.in.public.with.visible.motions..sign.of.the.cross..bowing..prostration..shokeling..etc."
## [6] "How.often.do.you..Pray.in.public.using.some.kind.of.physical.object..rosary..tefillin..etc."

tail(names(rel))

## [1] "How.comfortable.would.you.be.seeing.someone.who.practices.a.different.religion.from.you..Wear.religious.clothing.jewelry..hijab..kippah..wig..kara..turban..cross..etc."                                        
## [2] "How.comfortable.would.you.be.seeing.someone.who.practices.a.different.religion.from.you..Participate.in.a.public.religious.event.on.the.streets..Corpus.Christi.procession..inauguration.of.Torah.scrolls..etc."
## [3] "What.is.your.age."                                                                                                                                                                                              
## [4] "What.is.your.gender."                                                                                                                                                                                           
## [5] "How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year."                                                                                                                                
## [6] "US.Region"

1. Rename Variables

These are very long variable names. Let’s make these shorter and easier to work with.

shortColNames<-c("Religion", "RelOther", "Evangelical", "AttendServices", 
                 ### SET 1: HOW OFTEN (cols 5 thru 14)
                 "PrayMotions_Often", "PrayObject_Often", "PrayMeals_Often", "PrayFor_Often", 
                 "PrayWith_Often", "RelConvo_Often", "RelAsk_Often", "Dietary_Often", 
                 "WearRel_Often","PublicRel_Often",
                 
                 ### SET 2: HOW COMFORTABLE DO YOU FEEL (cols 15 thru 24)
                 "PrayMotions_Comfort", "PrayObject_Comfort", "PrayMeals_Comfort", "PrayFor_Comfort", 
                 "PrayWith_Comfort", "RelConvo_Comfort", "RelAsk_Comfort", "Dietary_Comfort", 
                 "WearRel_Comfort","PublicRel_Comfort",
                 
                 ### SET 3: HOW COMFORTABLE DO YOU THINK SOMEONE OUTSIDE (cols 25 thru 34)
                 "PrayMotions_Outside", "PrayObject_Outside", "PrayMeals_Outside", "PrayFor_Outside", 
                 "PrayWith_Outside", "RelConvo_Outside", "RelAsk_Outside", "Dietary_Outside", 
                 "WearRel_Outside","PublicRel_Outside",
                 
                 ### SET 4: HOW COMFORTABLE WOULD YOU BE SEEING SOMEONE WHO PRACTICES A DIFFERENT REL (cols 35 thru 44)
                 "PrayMotions_Different", "PrayObject_Different", "PrayMeals_Different", "PrayFor_Different", 
                 "PrayWith_Different", "RelConvo_Different", "RelAsk_Different", "Dietary_Different", 
                 "WearRel_Different","PublicRel_Different",
                 
                 ### DEMOS
                 "Age", "Gender", "Income", "US.Region"
                 
                 )

#### NEW NAMES

colnames(rel)<-shortColNames

#str(rel)

2. Weighting the Data

How many people in the sample are from each division (which in this dataset is called US.Region)?

library(tidyverse)

### SAMPLE DIVISION (IN THIS DATASET USE US.Region)
thisSamp2<-rel%>%
  group_by(US.Region)%>%
  summarise(samp=n())

thisSamp2

## # A tibble: 11 × 2
##    US.Region             samp
##    <chr>                <int>
##  1 ""                      13
##  2 "East North Central"   180
##  3 "East South Central"    54
##  4 "Middle Atlantic"      135
##  5 "Mountain"              74
##  6 "New England"           67
##  7 "Pacific"              150
##  8 "Response"               1
##  9 "South Atlantic"       196
## 10 "West North Central"    73
## 11 "West South Central"    97

We can use Census data to help re-balance the data.

### CENSUS DATA
popDiv<-data.frame(DivName=c("East North Central",
                             "East South Central", 
                             "Middle Atlantic", 
                             "Mountain", 
                             "New England", 
                             "Pacific", 
                             "South Atlantic", 
                             "West North Central", 
                             "West South Central"), 
                   pop=c(46798649, 
                         18931477,
                         41601787,
                         23811346,
                         14757573,
                         52833604,
                         63991523,
                         21179519,
                         39500457))

Now we can join the sample data and calculate weights again

### CALCULATE WEIGHTS
divJoin2<-popDiv%>%
  rename(US.Region=DivName)%>%
  left_join(thisSamp2)%>%
  mutate(weight=pop/samp)%>%
  mutate(FPC=sum(popDiv$pop))

## Joining with `by = join_by(US.Region)`

#divJoin2

Finally, we join the weights back on to the data

### JOIN RELIGION DATA
joinRel<-rel%>%
  left_join(divJoin2)%>%
  filter(!is.na(pop))

## Joining with `by = join_by(US.Region)`

#str(joinRel)

3. Perspectives on Comfort

This survey asks a series of questions can consider different perspectives of comfort with expressing and witnessing religious practice.

Consider refining our study to the following questions:

(‘RelConvo_Comfort’) How comfortable do you feel when you: Bring up your religion, unprompted, in conversation
(‘RelConvo_Outside’) How comfortable do you think someone outside your religion would be if they saw you: Bring up your religion, unprompted, in conversation
(‘RelConvo_Different’) How comfortable would you be seeing someone who practices a different religion from you: Bring up his or her own religion, unprompted, in conversation

relComfort<-joinRel%>%
  select("Religion", "RelConvo_Comfort", "RelConvo_Outside", "RelConvo_Different", 
         "weight")

A. Response Options for Comfort

unique(relComfort$RelConvo_Comfort)

## [1] "Very comfortable"       ""                       "Extremely comfortable" 
## [4] "Not so comfortable"     "I don't do this"        "Somewhat comfortably"  
## [7] "Not at all comfortable"

#unique(relComfort$RelConvo_Outside)
#unique(relComfort$RelConvo_Different)

B. Re-level for Order

R, by default, will order alphabetically, but we want the folloing order:

5: Extremely comfortable
4: Very comfortable
3: Somewhat comfortably
2: Not so comfortable
1: Not at all comfortable

relComfort$RelConvo_Comfort <- factor(relComfort$RelConvo_Comfort, levels = c("", "I don't do this", 
                                                                        "Not at all comfortable", "Not so comfortable" , 
                                                                        "Somewhat comfortably", "Very comfortable", 
                                                                        "Extremely comfortable"), ordered = TRUE)  

relComfort$RelConvo_Outside <- factor(relComfort$RelConvo_Outside, levels = c("", "I don't do this", 
                                                                        "Not at all comfortable", "Not so comfortable" , 
                                                                        "Somewhat comfortably", "Very comfortable", 
                                                                        "Extremely comfortable"), ordered = TRUE)  

relComfort$RelConvo_Different <- factor(relComfort$RelConvo_Different, levels = c("", "I don't do this", 
                                                                        "Not at all comfortable", "Not so comfortable" , 
                                                                        "Somewhat comfortably", "Very comfortable", 
                                                                        "Extremely comfortable"), ordered = TRUE)  

str(relComfort)

## 'data.frame':    1026 obs. of  5 variables:
##  $ Religion          : chr  "None of these" "Atheist" "Protestant" "Muslim" ...
##  $ RelConvo_Comfort  : Ord.factor w/ 7 levels ""<"I don't do this"<..: 6 1 7 6 4 2 2 2 6 6 ...
##  $ RelConvo_Outside  : Ord.factor w/ 7 levels ""<"I don't do this"<..: 5 1 5 7 4 4 2 2 2 5 ...
##  $ RelConvo_Different: Ord.factor w/ 7 levels ""<"I don't do this"<..: 7 5 7 6 4 5 7 3 4 6 ...
##  $ weight            : num  259992 308161 259992 326487 352224 ...

C. Wrangle

In order to compare across these perspectives of by religion, we will need to do a little wrangling.

### WRANGLE
### PIVOT LONGER

relC_Long<-relComfort%>%
  pivot_longer(cols=RelConvo_Comfort:RelConvo_Different, 
               names_to="RelConvo", 
               values_to="Comfort")

head(relC_Long)

## # A tibble: 6 × 4
##   Religion       weight RelConvo           Comfort                
##   <chr>           <dbl> <chr>              <ord>                  
## 1 None of these 259992. RelConvo_Comfort   "Very comfortable"     
## 2 None of these 259992. RelConvo_Outside   "Somewhat comfortably" 
## 3 None of these 259992. RelConvo_Different "Extremely comfortable"
## 4 Atheist       308161. RelConvo_Comfort   ""                     
## 5 Atheist       308161. RelConvo_Outside   ""                     
## 6 Atheist       308161. RelConvo_Different "Somewhat comfortably"

D. Sample Graphics

### FACETED BAR
relC_Long%>%
  filter(! Comfort %in% c("", "I don't do this"))%>%
  ggplot(aes(x=Religion, fill=Comfort))+
  geom_bar(position = "fill")+
  coord_flip()+
  facet_grid(.~RelConvo)

E. Weighted Graphics

### WEIGHTED FACETED BAR
relC_Long%>%
  filter(! Comfort %in% c("", "I don't do this"))%>%
  ggplot(aes(x=Religion, y=weight, fill=Comfort))+
  geom_bar(stat="identity", position = "fill")+
  coord_flip()+
  facet_grid(.~RelConvo)

F. Change in Color Palette

This color palette is designed to be color blind accessible (unlike the common red-green that is common with heatmaps)

### TAN TEAL PALETTE
tanteal<-c("#A6761D", "#D9B979", "#FBEEC2", "#80CDC1", "#01665E")


### WEIGHTED FACETED BAR
relC_Long%>%
  filter(! Comfort %in% c("", "I don't do this"))%>%
  ggplot(aes(x=Religion, y=weight, fill=Comfort))+
  geom_bar(stat="identity", position = "fill")+
  coord_flip()+
  facet_grid(.~RelConvo)+
  scale_fill_manual(values=tanteal)+
  theme_bw()

G. Ordering the Groups

By default all categories as alphabetical; however, we might want them to be ordered relative to a variable we are observing. Let’s do this by wrangling the data.

First we will start by grouping the two highest categories with the most positive sentiment.

### INDICATOR FOR POSITIVE SENTIMENT
relC_Pos<-relC_Long%>%
  mutate(pos = (Comfort %in% c("Very comfortable","Extremely comfortable")))

#head(relC_Pos)

## NOTE THIS CAN ALSO BE DONE WITH A CASE_WHEN

Next, we want to count how many are in each category. We can do this with the raw survey data or with the weighted values. I will do both to demonstrate how this can be done.

#### POSITIVE PROP 
posProp<-relC_Pos%>%
  filter(! Comfort %in% c("", "I don't do this"))%>%
  group_by(Religion, RelConvo)%>%
  summarise(sampN=n(), # SAMPLE SIZE
            wgtN=sum(weight), # SUM OF WEIGHTS IS EST POP SIZE
            posCount=sum(pos),
            sampPosProp=mean(pos), # SAMPLE PROPORTION
            wgtPosProp=sum(weight*pos)/wgtN) # WEIGHTED SAMPLE PROP

## `summarise()` has grouped output by 'Religion'. You can override using the
## `.groups` argument.

#View(posProp)

In our previous example, we looked at three questions at the same time in order to compare them. Here we need to just pick one question so that we can assess the order. There isn’t one order across all questions. We are going to need to filter.

#### FILTER AND ARRANGE
posComfort<-posProp%>%
  filter(RelConvo=="RelConvo_Comfort")%>%
  arrange(sampPosProp) # THIS IS THE ORDER WE WANT

#View(posComfort)
posComfort$Religion

##  [1] "Agnostic"           "Jewish"             "Orthodox Christian"
##  [4] "None of these"      "Protestant"         "Mormon"            
##  [7] "Roman Catholic"     "Buddhist"           "Muslim"            
## [10] "Hindu"

Now we can reorder the religion variable.

#### RELEVEL
relC_Long$Religion <- factor(relC_Long$Religion, levels = c(posComfort$Religion))

Finally, we can plot again!

### PLOT AGAIN!
relC_Long%>%
  filter(! Comfort %in% c("", "I don't do this"))%>%
  filter(Religion !="Atheist")%>%
  #filter(RelConvo=="RelConvo_Comfort")%>%
  ggplot(aes(x=Religion, fill=Comfort))+
  geom_bar(position = "fill")+
  coord_flip()+
  facet_grid(.~RelConvo)+
  scale_fill_manual(values=tanteal)+
  theme_bw()

### ONLY COMFORT
relC_Long%>%
  filter(! Comfort %in% c("", "I don't do this"))%>%
  filter(Religion !="Atheist")%>%
  filter(RelConvo=="RelConvo_Comfort")%>%
  ggplot(aes(x=Religion, fill=Comfort))+
  geom_bar(position = "fill")+
  coord_flip()+
  #facet_grid(.~RelConvo)+
  scale_fill_manual(values=tanteal)+
  theme_bw()