Introduction:

The research question for this study is to investigate the realtionship between stated party affiliation and the trust in the financial system. Specifically, could we claim that the trust in the financial institutions is independent of the political inclination of each respondent of this survey. That is an important conclusion, since depending on which political party is in power, it can influence the functioning of the financial system in many different ways, and mainly through adding or amending existing or new regulations. The smooth functioning of the banking sector, on the other hand, affects very strongly the economy as a whole, and hence the wellbeing of everyone.

Data:

gss_subset2012 <- subset(gss,select = c("partyid", "confinan"),year == 2012)
cases_with_na <- nrow(gss_subset2012)
cases <- addmargins(table(gss_subset2012))[9,4]

The number of cases is 1974, however due to the fact that not all respondents answered all the questions the real number of cases is 1315.

Exploratory data analysis:

First we can look at the basic summary statistics for the two variables studied, as well as some basic visualizations of the data.

table(gss_subset2012$partyid)
## 
##    Strong Democrat   Not Str Democrat       Ind,Near Dem 
##                356                343                235 
##        Independent       Ind,Near Rep Not Str Republican 
##                373                157                250 
##  Strong Republican        Other Party 
##                192                 54
table(gss_subset2012$confinan)
## 
## A Great Deal    Only Some   Hardly Any 
##          149          678          498
par(mfrow=c(1,2),cex=.7)
barplot(table(gss_subset2012$partyid),names.arg=levels(gss$partyid), cex.names=.5,col="red", main=c("Party affiliation"), ylab="number")
barplot(table(gss_subset2012$confinan),names.arg=levels(gss$confinan), cex.names=.9,col="lightblue", main=c("Trust in financil system"))

plot of chunk unnamed-chunk-4

From here can be concluded that the majority of the respondents of the survey have “only some” or “hardly any” trust in the financial institutions. Also that the respondents leaning towards the Democratic party outnumber those leaning to the Republican party, as well as there is a sizable chunk of Independents. These findings do not have particular meaning in this study, since we are intrested in the dependency of the two variables.

The next figures show the two-way contigency table of the two variables, and the so=called “mosaicplot” of the data.

addmargins(table(gss_subset2012))
##                     confinan
## partyid              A Great Deal Only Some Hardly Any  Sum
##   Strong Democrat              18       122         99  239
##   Not Str Democrat             30       112         91  233
##   Ind,Near Dem                 14        74         80  168
##   Independent                  31       114         92  237
##   Ind,Near Rep                 10        56         37  103
##   Not Str Republican           25       100         46  171
##   Strong Republican            18        78         29  125
##   Other Party                   2        16         21   39
##   Sum                         148       672        495 1315
mosaicplot(table(gss_subset2012), color="lightgreen", main= "Mosaicplot")

plot of chunk unnamed-chunk-5

From the “mosaicplot” we can see that there are some significant differences among the different groups. For example it can be seen that the more democratically inclined a respondent is, the less they trust the financial system, and vice versa. From this observation, based on this exploratory analysis, we can expect that the two variables are somewhat dependent.

Inference:

H0 (nothing going on): Party affiliation and the trust in the financial institutions are independent. The trust in the banks do not vary by party association.

HA (something going on): Party affiliation and the trust in the financial institutions are dependent. The trust in the banks do vary by party association.

  1. Independence: Sampled observations are independent, since we assumed that we have simple random sampling, so respondents are independent of each other.
  2. Random sample - again according to the GSS documentation we can assume that.
  3. We can assume sampling without replacement, and n = 1315 < 10% of the US population.
  4. Each case only contributes to one cell in the table. That can be assumed - it is the design of the GSS.
  5. Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases. That is largely met with the small exception of the last category of “Other Party” where the expected value is 4.39 for the level of “A Great Deal”. I would ignore that and conclude that the method can be applied.
data <- addmargins(table(subset(gss_subset2012,select=c("confinan","partyid"))))
observed <- data[1:3,1:8]
observed
##               partyid
## confinan       Strong Democrat Not Str Democrat Ind,Near Dem Independent
##   A Great Deal              18               30           14          31
##   Only Some                122              112           74         114
##   Hardly Any                99               91           80          92
##               partyid
## confinan       Ind,Near Rep Not Str Republican Strong Republican
##   A Great Deal           10                 25                18
##   Only Some              56                100                78
##   Hardly Any             37                 46                29
##               partyid
## confinan       Other Party
##   A Great Deal           2
##   Only Some             16
##   Hardly Any            21

Here we can see the observed values.

To perform the chi-square test of independence we have to find the expected values for each case of our contigency table. That can be done with the formula: expected count=((row total)*(column total))/table total. Thats why our table “data” is transformed with the command “addmargins” - to have the row and column totals. The calculations can be seen in the following code:

expected <- data[1:3,9]*(rbind(data[4,1:8],data[4,1:8],data[4,1:8]))/data[4,9]
expected
##      Strong Democrat Not Str Democrat Ind,Near Dem Independent
## [1,]           26.90            26.22        18.91       26.67
## [2,]          122.14           119.07        85.85      121.11
## [3,]           89.97            87.71        63.24       89.21
##      Ind,Near Rep Not Str Republican Strong Republican Other Party
## [1,]        11.59              19.25             14.07       4.389
## [2,]        52.64              87.39             63.88      19.930
## [3,]        38.77              64.37             47.05      14.681

That is the table with the expected values for each case.

To see the difference between the two can be seen in the following table:

observed -expected
##               partyid
## confinan       Strong Democrat Not Str Democrat Ind,Near Dem Independent
##   A Great Deal         -8.8989            3.776       -4.908       4.326
##   Only Some            -0.1354           -7.069      -11.852      -7.113
##   Hardly Any            9.0342            3.293       16.760       2.787
##               partyid
## confinan       Ind,Near Rep Not Str Republican Strong Republican
##   A Great Deal       -1.592              5.754             3.932
##   Only Some           3.364             12.614            14.122
##   Hardly Any         -1.772            -18.369           -18.053
##               partyid
## confinan       Other Party
##   A Great Deal      -2.389
##   Only Some         -3.930
##   Hardly Any         6.319

From here we can see that there are fairly big differences between the observed and the expected counts. Large deviations from what is expected provide evidence in favour of the alternative hypothesis. To quantify that we have to compute the chi-square statistic. That is done by simply squaring all those deviations, dividing the result by the expected values, and in the end summing up those values into one number which is the test statistic. That can be seen with the following code:

chi_stat <- sum(((observed-expected)^2)/expected)
chi_stat
## [1] 38.74

We see that the test statistic is 38.7385. To find whether there is evidence against our null hypothesis, we have to find the p-value associated with that statistic. The p-value is simply the probability of observing a result as extreme as suggested by that statistic, given that the null hypothesis is true. We use for that purpose the chi-square distribution, where the degreees of freedom are equal to (nrows-1)*(ncols-1), however in the code “-2” is used, because our data is with margins (i.e. totals).

df=(nrow(data)-2)*(ncol(data)-2)
p_value <- pchisq(chi_stat,df,lower.tail=F)
p_value
## [1] 0.0004002

As a result our p-value is 4.0024 × 10-4 - pretty small - so we have convincing evidence to reject the null hypothesis and conclude that based on this test there is dependence between the two variables.

Quick inference for the year 2010

Here the same calculations are made for the year 2010.

gss_subset2010 <- subset(gss,select=c("partyid", "confinan"),year == 2010)
data2010 <- addmargins(table(subset(gss_subset2010,select=c("confinan","partyid"))))
chi_stat2010 <- sum(((data2010[1:3,1:8]-data2010[1:3,9]*(rbind(data2010[4,1:8],data2010[4,1:8],data2010[4,1:8]))/data2010[4,9])^2)/(data2010[1:3,9]*(rbind(data2010[4,1:8],data2010[4,1:8],data2010[4,1:8]))/data2010[4,9]))
p_value2010 <- pchisq(chi_stat2010,df,lower.tail=F)
p_value2010
## [1] 0.3036

The p-value for 2010 is 0.3036, which is high (more that the significance level of 5%) and hence there is not enugh evidence to reject the null hypothesis, so we stick with the null that the two variables are actually independent of each other.

Conclusion:

Base on the data from the General Social Survey for the year 2012, we can conclude that stated party affiliation and the expressed trust in the financial system are related or dependent variables. So, can we then generalize that the Democratically leaning voters distrust banks more that the Republican leaning - probably more research should be done to make this conclusion, although theoretically we should generalize that for the US population as a whole. There could be a confoundig variable that causes the difference.

There is an interesting twist here - for the year 2010 we cannot reject the null hypothesis, and hence it appears that the two variables are independent of each other. So, for the year just after the finacial crisis we can conclude that the trust in the financial institutions is just the same, regardless of the party association of the respondents in the survey. Another way to interpret that is that maybe all respondents equally distrust the finacilal system.

References