The research question for this study is to investigate the realtionship between stated party affiliation and the trust in the financial system. Specifically, could we claim that the trust in the financial institutions is independent of the political inclination of each respondent of this survey. That is an important conclusion, since depending on which political party is in power, it can influence the functioning of the financial system in many different ways, and mainly through adding or amending existing or new regulations. The smooth functioning of the banking sector, on the other hand, affects very strongly the economy as a whole, and hence the wellbeing of everyone.
gss_subset2012 <- subset(gss,select = c("partyid", "confinan"),year == 2012)
cases_with_na <- nrow(gss_subset2012)
cases <- addmargins(table(gss_subset2012))[9,4]
The number of cases is 1974, however due to the fact that not all respondents answered all the questions the real number of cases is 1315.
Variables. The two variables are “partyid” and “confinan”. They are both categorical, and “confinan” is categorical ordinal since we can order the trust in the financial system “great deal”, “only some”, “hardly any”, although that fact is not used in this study.
This is an observational study, since we have a random sampling from the population. As I mention before, the assumption is that this is a normal random sample, but we can not assume that random assignment is taking place. So, the conclusion is a that we have only observation, i.e. we are asking random citizens from the population survey questions.
Scope of Inference - generalizability. The population of interest in this study is the US population as a whole. The findings of the study could be generalized for that population. That is the case because we assume a random sample, so we can generalize. Sources of bias could be found of course in the sampling. Depending on that there could be some discrepancy with reality. A closer look at the documentation of the GSS reveals that the survey is restricted only to the English-speaking population. So in that case the results could be generalized only for that segment of US population.
Scope of inference - casuality. In my view causation between the two variables studied, could not be establish based on this analysis. The reason is that we have only random sampling, and not random assignment. Instead, we can only generalize our findings.
First we can look at the basic summary statistics for the two variables studied, as well as some basic visualizations of the data.
table(gss_subset2012$partyid)
##
## Strong Democrat Not Str Democrat Ind,Near Dem
## 356 343 235
## Independent Ind,Near Rep Not Str Republican
## 373 157 250
## Strong Republican Other Party
## 192 54
table(gss_subset2012$confinan)
##
## A Great Deal Only Some Hardly Any
## 149 678 498
par(mfrow=c(1,2),cex=.7)
barplot(table(gss_subset2012$partyid),names.arg=levels(gss$partyid), cex.names=.5,col="red", main=c("Party affiliation"), ylab="number")
barplot(table(gss_subset2012$confinan),names.arg=levels(gss$confinan), cex.names=.9,col="lightblue", main=c("Trust in financil system"))
From here can be concluded that the majority of the respondents of the survey have “only some” or “hardly any” trust in the financial institutions. Also that the respondents leaning towards the Democratic party outnumber those leaning to the Republican party, as well as there is a sizable chunk of Independents. These findings do not have particular meaning in this study, since we are intrested in the dependency of the two variables.
The next figures show the two-way contigency table of the two variables, and the so=called “mosaicplot” of the data.
addmargins(table(gss_subset2012))
## confinan
## partyid A Great Deal Only Some Hardly Any Sum
## Strong Democrat 18 122 99 239
## Not Str Democrat 30 112 91 233
## Ind,Near Dem 14 74 80 168
## Independent 31 114 92 237
## Ind,Near Rep 10 56 37 103
## Not Str Republican 25 100 46 171
## Strong Republican 18 78 29 125
## Other Party 2 16 21 39
## Sum 148 672 495 1315
mosaicplot(table(gss_subset2012), color="lightgreen", main= "Mosaicplot")
From the “mosaicplot” we can see that there are some significant differences among the different groups. For example it can be seen that the more democratically inclined a respondent is, the less they trust the financial system, and vice versa. From this observation, based on this exploratory analysis, we can expect that the two variables are somewhat dependent.
H0 (nothing going on): Party affiliation and the trust in the financial institutions are independent. The trust in the banks do not vary by party association.
HA (something going on): Party affiliation and the trust in the financial institutions are dependent. The trust in the banks do vary by party association.
data <- addmargins(table(subset(gss_subset2012,select=c("confinan","partyid"))))
observed <- data[1:3,1:8]
observed
## partyid
## confinan Strong Democrat Not Str Democrat Ind,Near Dem Independent
## A Great Deal 18 30 14 31
## Only Some 122 112 74 114
## Hardly Any 99 91 80 92
## partyid
## confinan Ind,Near Rep Not Str Republican Strong Republican
## A Great Deal 10 25 18
## Only Some 56 100 78
## Hardly Any 37 46 29
## partyid
## confinan Other Party
## A Great Deal 2
## Only Some 16
## Hardly Any 21
Here we can see the observed values.
To perform the chi-square test of independence we have to find the expected values for each case of our contigency table. That can be done with the formula: expected count=((row total)*(column total))/table total. Thats why our table “data” is transformed with the command “addmargins” - to have the row and column totals. The calculations can be seen in the following code:
expected <- data[1:3,9]*(rbind(data[4,1:8],data[4,1:8],data[4,1:8]))/data[4,9]
expected
## Strong Democrat Not Str Democrat Ind,Near Dem Independent
## [1,] 26.90 26.22 18.91 26.67
## [2,] 122.14 119.07 85.85 121.11
## [3,] 89.97 87.71 63.24 89.21
## Ind,Near Rep Not Str Republican Strong Republican Other Party
## [1,] 11.59 19.25 14.07 4.389
## [2,] 52.64 87.39 63.88 19.930
## [3,] 38.77 64.37 47.05 14.681
That is the table with the expected values for each case.
To see the difference between the two can be seen in the following table:
observed -expected
## partyid
## confinan Strong Democrat Not Str Democrat Ind,Near Dem Independent
## A Great Deal -8.8989 3.776 -4.908 4.326
## Only Some -0.1354 -7.069 -11.852 -7.113
## Hardly Any 9.0342 3.293 16.760 2.787
## partyid
## confinan Ind,Near Rep Not Str Republican Strong Republican
## A Great Deal -1.592 5.754 3.932
## Only Some 3.364 12.614 14.122
## Hardly Any -1.772 -18.369 -18.053
## partyid
## confinan Other Party
## A Great Deal -2.389
## Only Some -3.930
## Hardly Any 6.319
From here we can see that there are fairly big differences between the observed and the expected counts. Large deviations from what is expected provide evidence in favour of the alternative hypothesis. To quantify that we have to compute the chi-square statistic. That is done by simply squaring all those deviations, dividing the result by the expected values, and in the end summing up those values into one number which is the test statistic. That can be seen with the following code:
chi_stat <- sum(((observed-expected)^2)/expected)
chi_stat
## [1] 38.74
We see that the test statistic is 38.7385. To find whether there is evidence against our null hypothesis, we have to find the p-value associated with that statistic. The p-value is simply the probability of observing a result as extreme as suggested by that statistic, given that the null hypothesis is true. We use for that purpose the chi-square distribution, where the degreees of freedom are equal to (nrows-1)*(ncols-1), however in the code “-2” is used, because our data is with margins (i.e. totals).
df=(nrow(data)-2)*(ncol(data)-2)
p_value <- pchisq(chi_stat,df,lower.tail=F)
p_value
## [1] 0.0004002
As a result our p-value is 4.0024 × 10-4 - pretty small - so we have convincing evidence to reject the null hypothesis and conclude that based on this test there is dependence between the two variables.
Here the same calculations are made for the year 2010.
gss_subset2010 <- subset(gss,select=c("partyid", "confinan"),year == 2010)
data2010 <- addmargins(table(subset(gss_subset2010,select=c("confinan","partyid"))))
chi_stat2010 <- sum(((data2010[1:3,1:8]-data2010[1:3,9]*(rbind(data2010[4,1:8],data2010[4,1:8],data2010[4,1:8]))/data2010[4,9])^2)/(data2010[1:3,9]*(rbind(data2010[4,1:8],data2010[4,1:8],data2010[4,1:8]))/data2010[4,9]))
p_value2010 <- pchisq(chi_stat2010,df,lower.tail=F)
p_value2010
## [1] 0.3036
The p-value for 2010 is 0.3036, which is high (more that the significance level of 5%) and hence there is not enugh evidence to reject the null hypothesis, so we stick with the null that the two variables are actually independent of each other.
Base on the data from the General Social Survey for the year 2012, we can conclude that stated party affiliation and the expressed trust in the financial system are related or dependent variables. So, can we then generalize that the Democratically leaning voters distrust banks more that the Republican leaning - probably more research should be done to make this conclusion, although theoretically we should generalize that for the US population as a whole. There could be a confoundig variable that causes the difference.
There is an interesting twist here - for the year 2010 we cannot reject the null hypothesis, and hence it appears that the two variables are independent of each other. So, for the year just after the finacial crisis we can conclude that the trust in the financial institutions is just the same, regardless of the party association of the respondents in the survey. Another way to interpret that is that maybe all respondents equally distrust the finacilal system.