Setup

Load data


Part 1: Data

The dataset to be analysed is an extract of the General Social Survey (GSS), a survey done since 1972 to gather societal data so as to monitor and explain trends as well as for comparison across societies. This data extract contains 57061 data points of 114 variables, with missing values removed from the responses.

As random sampling was used in this survey, the results from this project can be generalized to the entire US population.

However, there is no random assignment of participants into Control and Treatment group, which means that no causality can be drawn from this dataset.


Part 2: Research question

The research question I would like answered is:

Is there a relationship between race and political party affiliation?

This is of interest to me as it is often commented on media that race has a strong relationship with the political leanings of an individual, and I wish to take this opportunity to use statistics and test this anecdotal point.


Part 3: Exploratory data analysis

partyid race n percfreq
Strong Democrat White 5692 0.62432818
Strong Democrat Black 3075 0.33728200
Strong Democrat Other 350 0.03838982
Not Str Democrat White 9192 0.76345515
Not Str Democrat Black 2176 0.18073090
Not Str Democrat Other 672 0.05581395
Ind,Near Dem White 5489 0.81402936
Ind,Near Dem Black 895 0.13273024
Ind,Near Dem Other 359 0.05324040
Independent White 6767 0.79621132
Independent Black 962 0.11318979
Independent Other 770 0.09059889
Ind,Near Rep White 4517 0.91790287
Ind,Near Rep Black 235 0.04775452
Ind,Near Rep Other 169 0.03434261
Not Str Republican White 8450 0.93836757
Not Str Republican Black 297 0.03298168
Not Str Republican Other 258 0.02865075
Strong Republican White 5276 0.95097332
Strong Republican Black 141 0.02541456
Strong Republican Other 131 0.02361211
Other Party White 745 0.86527294
Other Party Black 71 0.08246225
Other Party Other 45 0.05226481

From this initial table, it seems that the there is a strong relationship between race and political leanings, as there is a larger percentage of whites supporting Republicans than blacks. We will proceed to visualize the following below.

From this chart, it is clear that there is some sort of relationship between race and political leanings, and there is a case to be made for statistical inference to be done.


Part 4: Inference

As the two variables selected are categorical, the chi-squared test of independence will be the best test to determine whether a relationship is present between the 2.

As only 1 test will be conducted, there is no need to apply Bonferroni correction. Hence I would then set the significance level to 0.05.

Hypotheses:
The null hypothesis (H0) is that political leanings is independent of race.

The alternative hypothesis (HA) is that political leanings does vary with race.

Conditions:
Independence: Sampled observations must be independent
As the samples were collected via random sampling, we can assume the sampled observations are independent from each other.

Sample Size: Each cell in a contingency table must contain at least 5 Cases
From the table below, the smallest number of cases is 45 (partyid: Other Party, race: Other), hence this still holds true.

##                     
##                      White Black Other
##   Strong Democrat     5692  3075   350
##   Not Str Democrat    9192  2176   672
##   Ind,Near Dem        5489   895   359
##   Independent         6767   962   770
##   Ind,Near Rep        4517   235   169
##   Not Str Republican  8450   297   258
##   Strong Republican   5276   141   131
##   Other Party          745    71    45

Chi-Square Independence Test:
With this Contingency table, we can now apply the Chi-Square Test to see if this is indeed independent.

## 
##  Pearson's Chi-squared test
## 
## data:  contitab
## X-squared = 5670.8, df = 14, p-value < 2.2e-16

p-value is very small (approximately zero). As the p-value is much smaller than the significance value we selected (0.05), we reject the null hypothesis in favor of our alternative hypothesis. We conclude that political leanings are not independent of race.

The expected value of the table, if race and political leanings are independent (aka H0 is true), is shown below:

##                     
##                          White     Black     Other
##   Strong Democrat    7412.6446 1261.7951 442.56033
##   Not Str Democrat   9789.2114 1666.3391 584.44954
##   Ind,Near Dem       5482.4462  933.2329 327.32087
##   Independent        6910.1751 1176.2638 412.56118
##   Ind,Near Rep       4001.0556  681.0676 238.87676
##   Not Str Republican 7321.5821 1246.2943 437.12359
##   Strong Republican  4510.8426  767.8446 269.31279
##   Other Party         700.0424  119.1626  41.79494

Thanks for reading! Hope this project is insightful to you as it was to me!