According to the GSS website, the data in the General Social Survey (GSS) is collected from an independently drawn sample of people above 18 years of age, living in non-institutional arrangements within the United States. The data was collected by random sampling across years through personal interviews. Hence, the results obtained can be generalized to the whole US population.
Since this was a survey, which is more like an observational study and not experimental, we cannot use it to determine causality.
Is there an association between race and political affiliation?
According to the news media, there exists sizable and long- standing racial and ethnic differences in political affiliation. It would be interesting to see how does the political leanings of each races vary throughout the years in the US and to test for any association between race and political affiliation.
The variables to be used are:
race: Race of respondent (Categorical Variable)
partyid: Political party affiliation (Categorical Variable)
Let us first create a table between partyid and race in order to get a sense of the data.
##
## White Black Other
## Strong Democrat 5692 3075 350
## Not Str Democrat 9192 2176 672
## Ind,Near Dem 5489 895 359
## Independent 6767 962 770
## Ind,Near Rep 4517 235 169
## Not Str Republican 8450 297 258
## Strong Republican 5276 141 131
## Other Party 745 71 45
The race variable has 3 categories and the partyid variable has 8 categories. For simplicity, I will group “Strong Democrat”, “Not Str Democrat” and “Ind,Near Dem” as Democrats. Similarly, “Ind,Near Rep”, “Not Str Republican” and “Strong Republican” will be grouped as Republican.
After that, let’s visualize the data using bar plot since we are plotting categorical variables.
#Creating a varibale called r_prty that has both race and partyid variables.
party <- gss %>%
filter(!is.na(partyid), !is.na(race), !is.na(year)) %>%
select(partyid, race, year)
#Creating the function to group democrats and republicans.
party_short <- function(word) {
short = word
if(short == "Strong Democrat" || short == "Not Str Democrat" || short == "Ind,Near Dem") {
return("Democrat")
}
if(short == "Ind,Near Rep" || short == "Not Str Republican" || short == "Strong Republican") {
return("Republican")
}
if(short == "Independent") {
return("Independent")
}
if(short == "Other Party") {
return("Other Party")
}
}
party$partyid <- sapply(party$partyid, party_short)
table(party$partyid)##
## Democrat Independent Other Party Republican
## 27900 8499 861 19474
##
## White Black Other
## Democrat 20373 6146 1381
## Independent 6767 962 770
## Other Party 745 71 45
## Republican 18243 673 558
#plotting using ggplot2
ggplot(data = party, aes(x = partyid, fill = race)) +
geom_bar(position = "fill") + labs(title = "Race & Political Affiliation",
y = "Proportion", x = "Political Party")From the above plot, both ‘Other Party’ and ‘Independent’ parties shows similar race proportions. But a clear difference in race proportions are noticeable among both ‘Democrat’ and ‘Republican’ parties. There are more Blacks and Others in the ‘Democrat’ compared to the ‘Republican’.
Let us also visualize how did the proportion of races differ among different political parties throughout the years.
#Grouping the variables, followed by summarizing their total counts, and
#then finally, mutating a new columns with their corresponding proportions.
b <- party %>%
group_by(partyid, year, race) %>%
summarise(n = n()) %>%
mutate(Prop = ifelse(race == "White", n/sum(n),
ifelse(race == "Black", n/sum(n),
ifelse(race == "Other", n/sum(n),0))))
b## # A tibble: 325 x 5
## # Groups: partyid, year [116]
## partyid year race n Prop
## <chr> <int> <fct> <int> <dbl>
## 1 Democrat 1972 White 704 0.764
## 2 Democrat 1972 Black 215 0.233
## 3 Democrat 1972 Other 3 0.00325
## 4 Democrat 1973 White 657 0.815
## 5 Democrat 1973 Black 144 0.179
## 6 Democrat 1973 Other 5 0.00620
## 7 Democrat 1974 White 691 0.837
## 8 Democrat 1974 Black 131 0.159
## 9 Democrat 1974 Other 4 0.00484
## 10 Democrat 1975 White 683 0.842
## # ... with 315 more rows
#Plotting for b using ggplot2
ggplot(data = b, aes(x = year, y = Prop)) +
geom_smooth(aes(fill=race)) + facet_wrap(~partyid)From the above plot, we can deduce the following:
There has been an increasing trend of both Black and Others identifying themselves as a Democrat throughout the years.
Both Independent and Democrat categories have similar trends.
Within the Republican party, the trend has hardly seen any change from 1972 to 2010. White race makes up the majority of the proportion.
The trend within the Other Party is somewhat similar to that of the Republican party.
Next, let’s do a statistical test to verify whether there is any association between race and party.
Null Hypothesis: Race and Political affiliations are independent of each other.
Alternative Hypothesis: There is an association between one’s race and political affiliation.
As we have three race groups and four political party groups, we will use the chi-square test of independence. But first, we check whether the conditions for using chi-square tests are satisfied.
1) Independence: As mentioned earlier, the observations in each group are independent as random sampling was employed during the GSS survey.
2) Expected Counts
## party$race
## party$partyid White Black Other
## Democrat 22684.3022 3861.3671 1354.33074
## Independent 6910.1751 1176.2638 412.56118
## Other Party 700.0424 119.1626 41.79494
## Republican 15833.4803 2695.2065 945.31315
For each cell, the expected count is higher than five. Hence, this condition is satisfied too.
3) Degree of Freedom: The degrees of freedom is given by (4-1)*(3-1)) = 6
All the conditions have been checked. Since we are working with categorical variables with more than two levels, there is no associated confidence interval. Hence, we cannot find the confidence interval.
Now, let’s continue with the chi-square test.
#At 95% confidence level.
inference(y = partyid, x = race, data = party, type = "ht",
statistic = "proportion", method = "theoretical", alternative = "greater")## Response variable: categorical (4 levels)
## Explanatory variable: categorical (3 levels)
## Observed:
## y
## x Democrat Independent Other Party Republican
## White 20373 6767 745 18243
## Black 6146 962 71 673
## Other 1381 770 45 558
##
## Expected:
## y
## x Democrat Independent Other Party Republican
## White 22684.302 6910.1751 700.04244 15833.4803
## Black 3861.367 1176.2638 119.16262 2695.2065
## Other 1354.331 412.5612 41.79494 945.3131
##
## H0: race and partyid are independent
## HA: race and partyid are dependent
## chi_sq = 4004.6596, df = 6, p_value = 0
Conclusion The P-value obtained is around zero and is much lower than the significance level of 0.05. Therefore, we have strong evidence that there IS an association between race and political party affiliation. Hence, we reject the Null Hypothesis in favor of the Alternative Hypothesis.
The results obtained here matches with the what we have seen from the above plots. People belonging to Black and other races increasingly lean towards Strong Democrats.