Setup

Load packages

Here are the R packages I used.

library(ggplot2)
library(dplyr)
library(statsr)

Load data

load("gss.Rdata")

Part 1: Data

The sampling used in this investigation is not random. The survey consisted only of people who participated in this study, which excludes the people who declined to participate in this survey. The analyses I conducted examined the relationship between opinions about halting the rising crime rate (natcrime) and a person’s reported race. Although these relationships are informative, it cannot test directionality (meaning, one cannot say there is a causal link between race and opinions about rising crime levels) and excludes the possibility that other variables not selected in my analyses (e.g., social-economic status, cohorts, etc.) influence the selected dependent variable.


Part 2: Research question

Research question: Does a person’s race influence his/her opinion on how the government handles crime?

Null hypothesis: There is no relationship between a person’s race and his/her opinion about halting crime rate.

Alternative hypothesis: There is a relationship between a person’s race and his/her opinion about halting crime rate.


Part 3: Exploratory data analysis

gss %>%
  ggplot(aes(race, fill = natcrime)) +  
  geom_bar(position = "fill")+
  theme_minimal() +             
  labs(                         
    title    = "General Social Survey", 
    subtitle = "Race and Crime",
    y        = "Proportion of Subjects",
    x        = "Race")

Before I conducted my statistical test, I used a bar chart to visualize the data. The x-axis represents the person’s race (White, Black, and Other), and the y-axis represents the proportion of subjects. The color fill in each bar represents the four responses to the natcrime question (Too Little, About Right, Too Much, and NA). This data visualization revealed there were many NA responses (gray color).

RemovedNA<-filter(gss, natcrime!="NA") 

Because there were many NA responses, I decided to remove subjects who did not respond to this question. This line of code filters the original data to exclude these subjects and into a new data frame that I named RemovedNA.

RemovedNA %>%
  ggplot(aes(race, fill = natcrime)) +  
  geom_bar(position = "fill")+
  theme_minimal() +             
  labs(                         
    title    = "General Social Survey", 
    subtitle = "Race and Crime",
    y        = "Proportion of Subjects",
    x        = "Race")

I ran the same code to create the previous bar chart but used the data that removed subjects who did not respond to the natcrime question. The x-axis represents the person’s race (White, Black, and Other), and the y-axis represents the proportion of subjects. The color fill in each bar represents the three responses to the natcrime question (Too Little, About Right, and Too Much). This data visualization suggests there are differences in responses associated with a person’s race.


Part 4: Inference

I used a Chi-Square Test of Independence to examine if there is a relationship between opinions people have about halting the rising crime rate and a person’s reported race, which are two categorical variables. The following are assumptions using this statistical test.

inference(y = natcrime, x = race, data = RemovedNA, statistic = "proportion", 
          type = "ht", method = "theoretical", success = "Too Little", alternative = "greater")
## Response variable: categorical (3 levels) 
## Explanatory variable: categorical (3 levels) 
## Observed:
##        y
## x       Too Little About Right Too Much
##   White      17412        7209     1562
##   Black       3258         797      255
##   Other        830         368       90
## 
## Expected:
##        y
## x       Too Little About Right   Too Much
##   White 17712.9260   6898.9787 1571.09534
##   Black  2915.7358   1135.6452  258.61899
##   Other   871.3382    339.3761   77.28567
## 
## H0: race and natcrime are independent
## HA: race and natcrime are dependent
## chi_sq = 166.7738, df = 4, p_value = 0

Results

Chi-Square Test

A chi-square test of independence was performed to examine the association between race and opinions about halting crime. The relationship between these variables was significant, Χ2 (4, N = 31,781) = 166.8, p < .001. Based on this statistical test, I reject the null hypothesis that there is no relationship between a person’s race and his/her opinion about halting crime rate. If you know a person’s race, it does help predict his/her opinion about government money spent to deter the crime rate.

Confidence Interval

Because I used a chi-square test, which uses categorical variables, I could not construct a confidence interval using this statistical test.

Discussion

These data suggest a person’s race does influence his/her opinion on how the government handles crime. If researchers are interested in this topic, they can conduct a more rigorous analysis by controlling for other possible variables not related to race (e.g., socioeconomic status) that might influence a person’s opinion on this issue.