The data comes from Social Explorer which is subscription based. To access it, go to Ryan Clement’s data site, and click “Social Explorer.” Let’s load the data
SE<-read_excel("SE_data.xlsx")
The variables we will use are the following:
-adherents = percentage of population that is affiliated with a religious body (includes members, children, and those who regularly attend a worship service)
-evangelical = percentage of population that identifies as Christian evangelical
-crime_violent = total violent crimes (murder, manslaughter, rape, aggravated assault) per 100,000
-crime_property = total property crime(burglarires, larcenies, motor vehicle thefts) per 100,000
-crime_hate = total hate crimes (anti-race, anti-ethnicity, anti-ancestry, anti-religious anti-sexual orientation, anti-disability, anti-gender) per 100,000
-income_median = Median Household Income (In 2018 Inflation Adjusted Dollars)
-covid_cases = Cumulative Confirmed Cases Rate per 100,000
-covid_deaths = Cumulative Death Rate per 100,000
Each observation (or row) is a state in the USA.
head(SE, n=10)
## # A tibble: 10 x 10
## Geo_FIPS Geo_NAME adherents evangelical crime_violent crime_property
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 Alabama 62.9 42.0 520. 2817.
## 2 2 Alaska 33.9 14.2 885. 3300.
## 3 4 Arizona 37.2 11.9 475. 2677.
## 4 5 Arkansas 55.4 39.0 544. 2913.
## 5 6 California 45.0 9.40 447. 2380.
## 6 8 Colorado 37.8 12.0 397. 2672.
## 7 9 Connecticut 51.2 4.40 207. 1681.
## 8 10 Delaware 41.8 7.20 424. 2324.
## 9 11 District Of Columbia 55.2 12.5 996. 4374.
## 10 12 Florida 39.1 16.2 385. 2282.
## # … with 4 more variables: crime_hate <dbl>, income_median <dbl>,
## # covid_cases <dbl>, covid_deaths <dbl>
The first state is Alabama and has an adherent rate of 63% of the population with 42% of the population identifying as a Christian evangelical.
Q1: What is the adherent rate in California? What percentage identify as evangelical?
Let’s look at summary statistics
summary(SE)
## Geo_FIPS Geo_NAME adherents evangelical
## Min. : 1.00 Length:51 Min. :27.63 Min. : 2.281
## 1st Qu.:16.50 Class :character 1st Qu.:40.45 1st Qu.: 9.489
## Median :29.00 Mode :character Median :50.83 Median :12.922
## Mean :28.96 Mean :48.47 Mean :15.945
## 3rd Qu.:41.50 3rd Qu.:55.33 3rd Qu.:19.120
## Max. :56.00 Max. :79.11 Max. :42.041
##
## crime_violent crime_property crime_hate income_median
## Min. :112.1 Min. :1248 Min. : 0.1674 Min. :43567
## 1st Qu.:241.5 1st Qu.:1673 1st Qu.: 0.9492 1st Qu.:53178
## Median :350.5 Median :2282 Median : 1.5990 Median :59116
## Mean :378.0 Mean :2246 Mean : 2.4992 Mean :60621
## 3rd Qu.:457.7 3rd Qu.:2674 3rd Qu.: 2.4815 3rd Qu.:68392
## Max. :995.9 Max. :4374 Max. :25.4821 Max. :82604
## NA's :2
## covid_cases covid_deaths
## Min. : 2472 Min. : 35.02
## 1st Qu.: 9583 1st Qu.:131.38
## Median :10641 Median :175.85
## Mean :10157 Mean :169.21
## 3rd Qu.:11441 3rd Qu.:205.55
## Max. :14641 Max. :295.58
##
How do you interpret the table? Using income_median, for all 50 states + the District of Columbia, on average, median household income is $60,621. The lowest median income is $43,567 and the highest is $82,604. We can order the states from highest to lowest income
arrange(SE,-income_median)
## # A tibble: 51 x 10
## Geo_FIPS Geo_NAME adherents evangelical crime_violent crime_property
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 11 District Of Columbia 55.2 12.5 996. 4374.
## 2 24 Maryland 41.8 12.0 469. 2033.
## 3 34 New Jersey 54.7 4.33 208. 1405.
## 4 15 Hawaii 41.3 9.58 249. 2870.
## 5 25 Massachusetts 57.2 3.43 338. 1263.
## 6 2 Alaska 33.9 14.2 885. 3300.
## 7 9 Connecticut 51.2 4.40 207. 1681.
## 8 33 New Hampshire 35.2 3.58 173. 1248.
## 9 51 Virginia 44.8 19.1 200. 1666.
## 10 6 California 45.0 9.40 447. 2380.
## # … with 41 more rows, and 4 more variables: crime_hate <dbl>,
## # income_median <dbl>, covid_cases <dbl>, covid_deaths <dbl>
The District of Columbia has the highest median income, followed by Maryland and New Jersey. If you go to the bottom of the table (click on “5” or “6”) we see that Arkansas, West Virginia, and Mississippi have the loweset median incomes.
Q2: Which states have the highest rates of adherence? Which states have the highest percentage of evangelicals?
Do states with a higher rate religious adherence have lower or higher rates of crime?
One of the best ways to examine this association is to use a scatterplot.
ggplot(data = SE, aes(x = adherents, y = crime_violent)) +
geom_point()
Each dot is a state. The x-axis is the percentage of adherents while the y-axis is the number of violent crimes per 100,000. Sometimes with just the scattplot, it can be difficult to discern if there is a pattern. Let’s add a linear regression line.
ggplot(data = SE, aes(x = adherents, y = crime_violent)) +
geom_point() + geom_smooth(method ='lm')
## `geom_smooth()` using formula 'y ~ x'
It looks like there is a slightly positive association between religious adherence and violent crime - in other words as adherence increases, so does violent crime. Let’s format the figure.
ggplot(data = SE, aes(x = adherents, y = crime_violent)) +
geom_point() + geom_smooth(method ='lm') +
ggtitle("Religiosity and Violent Crime") +
xlab("Pct Religious Adherent") +
ylab("Violent Crimes per 100,000")
## `geom_smooth()` using formula 'y ~ x'
We can also estimate the linear regression to see how “strong” the association is between adherents and violent crime.
result.1<-lm(crime_violent ~ adherents, data = SE)
summary(result.1)
##
## Call:
## lm(formula = crime_violent ~ adherents, data = SE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -246.88 -142.90 -29.97 77.71 611.82
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 333.9191 123.4762 2.704 0.00938 **
## adherents 0.9089 2.4929 0.365 0.71698
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 181.1 on 49 degrees of freedom
## Multiple R-squared: 0.002706, Adjusted R-squared: -0.01765
## F-statistic: 0.1329 on 1 and 49 DF, p-value: 0.717
We see that the slope of the regression line is .9089 and it is NOT statistically significant at the 5% level (P-value = .7169) and there are no stars (*) in the row. Given that it is not statistically significant, we infer that there is no evidence that greater adherence is associated with more or less violent crime.
Q3: Using both a scatter plot with a linear regression line AND a linear regression, is there any evidence that evangelical Christianity is associated with more or less property crime?
Q4: Based on your results for Q3, can we infer that Evangelical Christianity is causing a change in property crime? Why or why not?
Q5: Use the data methods above, is there evidence that religiosity is associated with higher income?
Q6: Based on the results for crime and income, should society be encouraging religious participation?