First, I’ll load in the data and rename the columns.
library(dplyr)
library(tidyverse)
religionData <- read.csv('religionData.csv', header= TRUE, sep = ',')
religionData %>% as_tibble(religionData)
# rename columns
colNames <- c('RELIGION','RELIGION2', 'EVANGELICAL', 'RELIGIOUS_SERVICES', 'FREQ_PRAY_WITH_MOTIONS', 'FREQ_PRAY_WITH_OBJECTS', 'FREQ_PRAY_BEFORE_MEALS',
'FREQ_PRAY_FOR_OTHERS', 'FREQ_ASK_TO_PRAY_WITH_SOMEONE', 'FREQ_BRING_UP_RELIGION',
'FREQ_ASK_ABOUT_RELIGION', 'FREQ_DECLINE_FOOD_FOR_RELIGION', 'FREQ_WEAR_RELIGIOUS_CLOTHING', 'FREQ_PARTICIPATE_IN_PUBLIC_RELIGIOUS_EVENT',
'COMFORT_OWN_PRAY_WITH_MOTIONS',
'COMFORT_OWN_PRAY_WITH_OBJECTS',
'COMFORT_OWN_PRAY_BEFORE_MEALS',
'COMFORT_OWN_PRAY_FOR_OTHERS',
'COMFORT_OWN_ASK_TO_PRAY_WITH_SOMEONE',
'COMFORT_OWN_BRING_UP_RELIGION',
'COMFORT_OWN_ASK_ABOUT_RELIGION',
'COMFORT_OWN_DECLINE_FOOD_FOR_RELIGION',
'COMFORT_OWN_WEAR_RELIGIOUS_CLOTHING',
'COMFORT_OWN_PARTICIPATE_IN_PUBLIC_RELIGIOUS_EVENT',
'COMFORT_OTHER_PRAY_WITH_MOTIONS', 'COMFORT_OTHER_PRAY_WITH_OBJECTS', 'COMFORT_OTHER_PRAY_BEFORE_MEALS', 'COMFORT_OTHER_PRAY_FOR_OTHERS', 'COMFORT_OTHER_ASK_TO_PRAY_WITH_SOMEONE', 'COMFORT_OTHER_BRING_UP_RELIGION', 'COMFORT_OTHER_ASK_ABOUT_RELIGION', 'COMFORT_OTHER_DECLINE_FOOD_FOR_RELIGION',
'COMFORT_OTHER_WEAR_RELIGIOUS_CLOTHING', 'COMFORT_OTHER_PARTICIPATE_IN_PUBLIC_RELIGIOUS_EVENT', 'COMFORT_SEE_OTHER_PRAY_WITH_MOTIONS', 'COMFORT_SEE_OTHER_PRAY_WITH_OBJECTS', 'COMFORT_SEE_OTHER_PRAY_BEFORE_MEALS', 'COMFORT_SEE_OTHER_PRAY_FOR_OTHERS', 'COMFORT_SEE_OTHER_ASK_TO_PRAY_WITH_SOMEONE', 'COMFORT_SEE_OTHER_BRING_UP_RELIGION',
'COMFORT_SEE_OTHER_ASK_ABOUT_RELIGION', 'COMFORT_SEE_OTHER_DECLINE_FOOD_FOR_RELIGION', 'COMFORT_SEE_OTHER_WEAR_RELIGIOUS_CLOTHING', 'COMFORT_SEE_OTHER_PARTICIPATE_IN_PUBLIC_RELIGIOUS_EVENT', 'AGE', 'GENDER', 'HOUSEHOLD_SALARY', 'US_REGION')
names(religionData) <- colNames
There are a number of fields within this dataset, so I am going to subset it only to general demographics and survey responses related to the comfort of seeing religious actions outside of the respondent’s religion.
colsToKeep <- c('COMFORT_SEE_OTHER_PRAY_WITH_MOTIONS', 'COMFORT_SEE_OTHER_PRAY_WITH_OBJECTS', 'COMFORT_SEE_OTHER_PRAY_BEFORE_MEALS', 'COMFORT_SEE_OTHER_PRAY_FOR_OTHERS', 'COMFORT_SEE_OTHER_ASK_TO_PRAY_WITH_SOMEONE', 'COMFORT_SEE_OTHER_BRING_UP_RELIGION',
'COMFORT_SEE_OTHER_ASK_ABOUT_RELIGION', 'COMFORT_SEE_OTHER_DECLINE_FOOD_FOR_RELIGION', 'COMFORT_SEE_OTHER_WEAR_RELIGIOUS_CLOTHING', 'COMFORT_SEE_OTHER_PARTICIPATE_IN_PUBLIC_RELIGIOUS_EVENT', 'AGE', 'GENDER', 'HOUSEHOLD_SALARY', 'US_REGION')
religionData <- religionData %>%
filter(RELIGION != 'Response') %>%
select(colsToKeep)
In order to quantify each individual’s comfort with public religious displays, I need to convert my categorical features to numeric. First, I will rank each survey response in order of comfort (with Not at all comfortable having the lowest ranking and Extremely comfortable having the highest ranking)
comfortColumns <- seq(1:10)
for (i in comfortColumns) {
religionData[[i]] <- factor(religionData[[i]], levels = c("", "Response", "Not at all comfortable", "Not so comfortable", "Somewhat comfortably", "Very comfortable", "Extremely comfortable"), ordered = TRUE)
}
# eliminate records where any of the survey responses are blank
toBeRemoved<-which(religionData$COMFORT_SEE_OTHER_PRAY_WITH_MOTIONS==""|religionData$COMFORT_SEE_OTHER_PRAY_WITH_OBJECTS==""|religionData$COMFORT_SEE_OTHER_PRAY_BEFORE_MEALS==""|religionData$COMFORT_SEE_OTHER_PRAY_FOR_OTHERS==""|religionData$COMFORT_SEE_OTHER_ASK_TO_PRAY_WITH_SOMEONE==""|religionData$COMFORT_SEE_OTHER_BRING_UP_RELIGION==""|religionData$COMFORT_SEE_OTHER_ASK_ABOUT_RELIGION==""|religionData$COMFORT_SEE_OTHER_DECLINE_FOOD_FOR_RELIGION==""|religionData$COMFORT_SEE_OTHER_WEAR_RELIGIOUS_CLOTHING==""|religionData$COMFORT_SEE_OTHER_PARTICIPATE_IN_PUBLIC_RELIGIOUS_EVENT=="")
religionData<-religionData[-toBeRemoved,]
The survey responses are still categorical, so I need to convert them to numeric values. Since the columns are already ordered, the numerical values should maintain the hierarchy. I’ll also need to compute an average ranking across survey responses. I’ll create a new variable called AVERAGE_RANKING with this information.
religionData$AVERAGE_RATING <- 0
# loop through all survey questions and convert each response to a number
# add number to the AVERAGE_RATING column
for (i in comfortColumns) {
religionData[[i]]<-as.numeric(religionData[[i]])
religionData$AVERAGE_RATING <- religionData$AVERAGE_RATING + religionData[[i]]
}
# final average rating
religionData$AVERAGE_RATING <- religionData$AVERAGE_RATING/10
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Are age, income, and region predictive of an individual’s comfort level with public religious displays?
What are the cases, and how many are there?
Each case represents a respondent’s survey answers. There are a total of 979 cases in the dataset.
Describe the method of data collection.
This data was collected using a SurveyMonkey poll, conducted between July 29 and August 1, 2016. The survey asked 661 respondents questions about public displays of religion.
What type of study is this (observational/experiment)?
This is a survey, which is a type of observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
This data came from: https://github.com/fivethirtyeight/data/tree/master/religion-survey.
What is the response variable? Is it quantitative or qualitative?
The response variable is comfort level and it is numeric.
You should have two independent variables, one quantitative and one qualitative.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(religionData$AVERAGE_RATING)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 5.700 5.597 6.200 7.000
table(religionData$AGE)
##
## 18 - 29 30 - 44 45 - 59 60+ Response
## 212 255 264 248 0
table(religionData$HOUSEHOLD_SALARY)
##
## $0 to $9,999 $10,000 to $24,999 $100,000 to $124,999
## 86 93 87
## $125,000 to $149,999 $150,000 to $174,999 $175,000 to $199,999
## 52 37 15
## $200,000 and up $25,000 to $49,999 $50,000 to $74,999
## 53 166 151
## $75,000 to $99,999 Prefer not to answer Response
## 111 128 0
table(religionData$US_REGION)
##
## East North Central East South Central
## 13 164 50
## Middle Atlantic Mountain New England
## 124 67 65
## Pacific Response South Atlantic
## 146 0 191
## West North Central West South Central
## 68 91
religionData %>%
group_by(AGE) %>%
summarise(MEAN_BY_AGE = mean(AVERAGE_RATING),
MEDIAN_BY_AGE = median(AVERAGE_RATING),
STDEV_BY_AGE = sd(AVERAGE_RATING))
## # A tibble: 4 x 4
## AGE MEAN_BY_AGE MEDIAN_BY_AGE STDEV_BY_AGE
## <fct> <dbl> <dbl> <dbl>
## 1 18 - 29 5.67 5.75 0.966
## 2 30 - 44 5.54 5.6 1.02
## 3 45 - 59 5.61 5.7 0.938
## 4 60+ 5.59 5.6 0.808
boxplot(religionData$AVERAGE_RATING~religionData$AGE)
religionData %>%
group_by(HOUSEHOLD_SALARY) %>%
summarise(MEAN_BY_SALARY = mean(AVERAGE_RATING),
MEDIAN_BY_SALARY = median(AVERAGE_RATING),
STDEV_BY_SALARY = sd(AVERAGE_RATING))
## # A tibble: 11 x 4
## HOUSEHOLD_SALARY MEAN_BY_SALARY MEDIAN_BY_SALARY STDEV_BY_SALARY
## <fct> <dbl> <dbl> <dbl>
## 1 $0 to $9,999 5.54 5.75 1.16
## 2 $10,000 to $24,999 5.68 5.8 0.937
## 3 $100,000 to $124,999 5.54 5.6 0.939
## 4 $125,000 to $149,999 5.58 5.7 0.983
## 5 $150,000 to $174,999 5.45 5.6 1.01
## 6 $175,000 to $199,999 5.61 5.7 0.728
## 7 $200,000 and up 5.60 5.7 0.855
## 8 $25,000 to $49,999 5.63 5.8 1.01
## 9 $50,000 to $74,999 5.76 5.9 0.832
## 10 $75,000 to $99,999 5.47 5.6 0.945
## 11 Prefer not to answer 5.53 5.55 0.778
boxplot(religionData$AVERAGE_RATING~religionData$HOUSEHOLD_SALARY)
religionData %>%
group_by(US_REGION) %>%
summarise(MEAN_BY_REGION = mean(AVERAGE_RATING),
MEDIAN_BY_REGION = median(AVERAGE_RATING),
STDEV_BY_REGION = sd(AVERAGE_RATING))
## # A tibble: 10 x 4
## US_REGION MEAN_BY_REGION MEDIAN_BY_REGION STDEV_BY_REGION
## <fct> <dbl> <dbl> <dbl>
## 1 "" 5.19 5.3 1.07
## 2 East North Central 5.69 5.9 0.986
## 3 East South Central 5.85 5.95 0.735
## 4 Middle Atlantic 5.56 5.6 0.916
## 5 Mountain 5.41 5.5 1.05
## 6 New England 5.69 5.7 0.782
## 7 Pacific 5.59 5.7 1.02
## 8 South Atlantic 5.58 5.6 0.953
## 9 West North Central 5.51 5.65 0.880
## 10 West South Central 5.59 5.6 0.807
boxplot(religionData$AVERAGE_RATING~religionData$US_REGION)
religionData %>%
group_by(GENDER) %>%
summarise(MEAN_BY_GENDER = mean(AVERAGE_RATING),
MEDIAN_BY_GENDER = median(AVERAGE_RATING),
STDEV_BY_GENDER = sd(AVERAGE_RATING))
## # A tibble: 2 x 4
## GENDER MEAN_BY_GENDER MEDIAN_BY_GENDER STDEV_BY_GENDER
## <fct> <dbl> <dbl> <dbl>
## 1 Female 5.70 5.8 0.852
## 2 Male 5.49 5.6 1.01
boxplot(religionData$AVERAGE_RATING~religionData$GENDER)
religionData %>%
group_by(AGE,GENDER) %>%
summarise(MEAN_BY_AGE_GENDER = mean(AVERAGE_RATING),
MEDIAN_BY_AGE_GENDER = median(AVERAGE_RATING),
STDEV_BY_AGE_GENDER = sd(AVERAGE_RATING))
## # A tibble: 8 x 5
## # Groups: AGE [4]
## AGE GENDER MEAN_BY_AGE_GENDER MEDIAN_BY_AGE_GENDER STDEV_BY_AGE_GEND~
## <fct> <fct> <dbl> <dbl> <dbl>
## 1 18 - 29 Female 5.77 5.9 0.909
## 2 18 - 29 Male 5.55 5.5 1.01
## 3 30 - 44 Female 5.61 5.7 0.872
## 4 30 - 44 Male 5.46 5.5 1.16
## 5 45 - 59 Female 5.73 5.8 0.844
## 6 45 - 59 Male 5.48 5.65 1.02
## 7 60+ Female 5.68 5.7 0.794
## 8 60+ Male 5.47 5.55 0.813
ggplot(religionData, aes(x=AVERAGE_RATING))+ geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.