The dataset to be analysed is an extract of the General Social Survey (GSS), a survey done since 1972 to gather societal data so as to monitor and explain trends as well as for comparison across societies. This data extract contains 57061 data points of 114 variables, with missing values removed from the responses.
As random sampling was used in this survey, the results from this project can be generalized to the entire US population.
However, there is no random assignment of participants into Control and Treatment group, which means that no causality can be drawn from this dataset.
The research question I would like answered is:
Is there a relationship between race and political party affiliation?
This is of interest to me as it is often commented on media that race has a strong relationship with the political leanings of an individual, and I wish to take this opportunity to use statistics and test this anecdotal point.
framing <- gss %>%
select(partyid,race) %>%
na.omit()
raceparty<- framing %>%
group_by(partyid,race) %>%
summarise(n=n()) %>%
mutate(percfreq=n/sum(n))
formattable(raceparty)| partyid | race | n | percfreq |
|---|---|---|---|
| Strong Democrat | White | 5692 | 0.62432818 |
| Strong Democrat | Black | 3075 | 0.33728200 |
| Strong Democrat | Other | 350 | 0.03838982 |
| Not Str Democrat | White | 9192 | 0.76345515 |
| Not Str Democrat | Black | 2176 | 0.18073090 |
| Not Str Democrat | Other | 672 | 0.05581395 |
| Ind,Near Dem | White | 5489 | 0.81402936 |
| Ind,Near Dem | Black | 895 | 0.13273024 |
| Ind,Near Dem | Other | 359 | 0.05324040 |
| Independent | White | 6767 | 0.79621132 |
| Independent | Black | 962 | 0.11318979 |
| Independent | Other | 770 | 0.09059889 |
| Ind,Near Rep | White | 4517 | 0.91790287 |
| Ind,Near Rep | Black | 235 | 0.04775452 |
| Ind,Near Rep | Other | 169 | 0.03434261 |
| Not Str Republican | White | 8450 | 0.93836757 |
| Not Str Republican | Black | 297 | 0.03298168 |
| Not Str Republican | Other | 258 | 0.02865075 |
| Strong Republican | White | 5276 | 0.95097332 |
| Strong Republican | Black | 141 | 0.02541456 |
| Strong Republican | Other | 131 | 0.02361211 |
| Other Party | White | 745 | 0.86527294 |
| Other Party | Black | 71 | 0.08246225 |
| Other Party | Other | 45 | 0.05226481 |
From this initial table, it seems that the there is a strong relationship between race and political leanings, as there is a larger percentage of whites supporting Republicans than blacks. We will proceed to visualize the following below.
ggplot(raceparty, aes(x=partyid, y=percfreq*100,fill=race)) +
geom_bar(stat="identity") +
coord_flip()+
labs(title="Proportion of Support by Race between political parties",y="Percentage") From this chart, it is clear that there is some sort of relationship between race and political leanings, and there is a case to be made for statistical inference to be done.
As the two variables selected are categorical, the chi-squared test of independence will be the best test to determine whether a relationship is present between the 2.
As only 1 test will be conducted, there is no need to apply Bonferroni correction. Hence I would then set the significance level to 0.05.
Hypotheses:
The null hypothesis (H0) is that political leanings is independent of race.
The alternative hypothesis (HA) is that political leanings does vary with race.
Conditions:
Independence: Sampled observations must be independent
As the samples were collected via random sampling, we can assume the sampled observations are independent from each other.
Sample Size: Each cell in a contingency table must contain at least 5 Cases
From the table below, the smallest number of cases is 45 (partyid: Other Party, race: Other), hence this still holds true.
##
## White Black Other
## Strong Democrat 5692 3075 350
## Not Str Democrat 9192 2176 672
## Ind,Near Dem 5489 895 359
## Independent 6767 962 770
## Ind,Near Rep 4517 235 169
## Not Str Republican 8450 297 258
## Strong Republican 5276 141 131
## Other Party 745 71 45
Chi-Square Independence Test:
With this Contingency table, we can now apply the Chi-Square Test to see if this is indeed independent.
##
## Pearson's Chi-squared test
##
## data: contitab
## X-squared = 5670.8, df = 14, p-value < 2.2e-16
p-value is very small (approximately zero). As the p-value is much smaller than the significance value we selected (0.05), we reject the null hypothesis in favor of our alternative hypothesis. We conclude that political leanings are not independent of race.
The expected value of the table, if race and political leanings are independent (aka H0 is true), is shown below:
##
## White Black Other
## Strong Democrat 7412.6446 1261.7951 442.56033
## Not Str Democrat 9789.2114 1666.3391 584.44954
## Ind,Near Dem 5482.4462 933.2329 327.32087
## Independent 6910.1751 1176.2638 412.56118
## Ind,Near Rep 4001.0556 681.0676 238.87676
## Not Str Republican 7321.5821 1246.2943 437.12359
## Strong Republican 4510.8426 767.8446 269.31279
## Other Party 700.0424 119.1626 41.79494
Thanks for reading! Hope this project is insightful to you as it was to me!