Is the trust in the financial system independent of party affiliation for the year 2012

Introduction:

The research question for this study is to investigate the realtionship between stated party affiliation and the trust in the financial system. Specifically, could we claim that the trust in the financial institutions is independent of the political inclination of each respondent of this survey. That is an important conclusion, since depending on which political party is in power, it can influence the functioning of the financial system in many different ways, and mainly through adding or amending existing or new regulations. The smooth functioning of the banking sector, on the other hand, affects very strongly the economy as a whole, and hence the wellbeing of everyone.

Data:

Data Collection. The data used in this study is from the General Social Survey, 1972 - 2012. The collection of the data for the General Social Survey(GSS) is fairly complicated. During the years of collection there were different methods and many changes in methodology. Most importantly there wasn’t a simple random sampling, rather a combination of different strata and blocks within each stratum. This description does not capture of course the full extend of the process but sheds some light into the work for this assignment. Nevertheless, I would treat this data as randomly collected, i.e. a normal simple random sample. There are many suggestions as to how adjust for all that, but that is beyond the scope of this study. For the brevity of this analysis we wil use a subset of the data supplied by the GSS. We will choose two variables - “partyid” and “confinan”. We will select also only the data which pertains to the year 2012, since that is the most recent information available. The code can be seen here:

gss_subset2012 <- subset(gss,select = c("partyid", "confinan"),year == 2012)

Cases. The cases are the answers to a particular question from a participant in the General Social Survey, for a given year. In this study the questions are about to which party are you associate yourself with and what is the degree of trust in the people running the financial institutions.

cases_with_na <- nrow(gss_subset2012)
cases <- addmargins(table(gss_subset2012))[9,4]

The number of cases is 1974, however due to the fact that not all respondents answered all the questions the real number of cases is 1315.

Variables. The two variables are “partyid” and “confinan”. They are both categorical, and “confinan” is categorical ordinal since we can order the trust in the financial system “great deal”, “only some”, “hardly any”, although that fact is not used in this study.
This is an observational study, since we have a random sampling from the population. As I mention before, the assumption is that this is a normal random sample, but we can not assume that random assignment is taking place. So, the conclusion is a that we have only observation, i.e. we are asking random citizens from the population survey questions.
Scope of Inference - generalizability. The population of interest in this study is the US population as a whole. The findings of the study could be generalized for that population. That is the case because we assume a random sample, so we can generalize. Sources of bias could be found of course in the sampling. Depending on that there could be some discrepancy with reality. A closer look at the documentation of the GSS reveals that the survey is restricted only to the English-speaking population. So in that case the results could be generalized only for that segment of US population.
Scope of inference - casuality. In my view causation between the two variables studied, could not be establish based on this analysis. The reason is that we have only random sampling, and not random assignment. Instead, we can only generalize our findings.

Exploratory data analysis:

First we can look at the basic summary statistics for the two variables studied, as well as some basic visualizations of the data.

table(gss_subset2012$partyid)

## 
##    Strong Democrat   Not Str Democrat       Ind,Near Dem 
##                356                343                235 
##        Independent       Ind,Near Rep Not Str Republican 
##                373                157                250 
##  Strong Republican        Other Party 
##                192                 54

table(gss_subset2012$confinan)

## 
## A Great Deal    Only Some   Hardly Any 
##          149          678          498

par(mfrow=c(1,2),cex=.7)
barplot(table(gss_subset2012$partyid),names.arg=levels(gss$partyid), cex.names=.5,col="red", main=c("Party affiliation"), ylab="number")
barplot(table(gss_subset2012$confinan),names.arg=levels(gss$confinan), cex.names=.9,col="lightblue", main=c("Trust in financil system"))

plot of chunk unnamed-chunk-4

From here can be concluded that the majority of the respondents of the survey have “only some” or “hardly any” trust in the financial institutions. Also that the respondents leaning towards the Democratic party outnumber those leaning to the Republican party, as well as there is a sizable chunk of Independents. These findings do not have particular meaning in this study, since we are intrested in the dependency of the two variables.

The next figures show the two-way contigency table of the two variables, and the so=called “mosaicplot” of the data.

addmargins(table(gss_subset2012))

##                     confinan
## partyid              A Great Deal Only Some Hardly Any  Sum
##   Strong Democrat              18       122         99  239
##   Not Str Democrat             30       112         91  233
##   Ind,Near Dem                 14        74         80  168
##   Independent                  31       114         92  237
##   Ind,Near Rep                 10        56         37  103
##   Not Str Republican           25       100         46  171
##   Strong Republican            18        78         29  125
##   Other Party                   2        16         21   39
##   Sum                         148       672        495 1315

mosaicplot(table(gss_subset2012), color="lightgreen", main= "Mosaicplot")

plot of chunk unnamed-chunk-5

From the “mosaicplot” we can see that there are some significant differences among the different groups. For example it can be seen that the more democratically inclined a respondent is, the less they trust the financial system, and vice versa. From this observation, based on this exploratory analysis, we can expect that the two variables are somewhat dependent.

Inference:

Hypothesis.

H0 (nothing going on): Party affiliation and the trust in the financial institutions are independent. The trust in the banks do not vary by party association.

HA (something going on): Party affiliation and the trust in the financial institutions are dependent. The trust in the banks do vary by party association.

Checking the Conditions

Independence: Sampled observations are independent, since we assumed that we have simple random sampling, so respondents are independent of each other.
Random sample - again according to the GSS documentation we can assume that.
We can assume sampling without replacement, and n = 1315 < 10% of the US population.
Each case only contributes to one cell in the table. That can be assumed - it is the design of the GSS.
Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases. That is largely met with the small exception of the last category of “Other Party” where the expected value is 4.39 for the level of “A Great Deal”. I would ignore that and conclude that the method can be applied.

The method used. The method to be used in this study is Chi-square test of independence. This method is used here because we have two categorical variables, both with more that two levels. With this method we can test whether the proportions of the differnt levels of one variable change significantly when compared to the different levels of the other variable. To illustrate how this works we first load the information into a new matrix called “data”:

data <- addmargins(table(subset(gss_subset2012,select=c("confinan","partyid"))))
observed <- data[1:3,1:8]
observed

##               partyid
## confinan       Strong Democrat Not Str Democrat Ind,Near Dem Independent
##   A Great Deal              18               30           14          31
##   Only Some                122              112           74         114
##   Hardly Any                99               91           80          92
##               partyid
## confinan       Ind,Near Rep Not Str Republican Strong Republican
##   A Great Deal           10                 25                18
##   Only Some              56                100                78
##   Hardly Any             37                 46                29
##               partyid
## confinan       Other Party
##   A Great Deal           2
##   Only Some             16
##   Hardly Any            21

Here we can see the observed values.

To perform the chi-square test of independence we have to find the expected values for each case of our contigency table. That can be done with the formula: expected count=((row total)*(column total))/table total. Thats why our table “data” is transformed with the command “addmargins” - to have the row and column totals. The calculations can be seen in the following code:

expected <- data[1:3,9]*(rbind(data[4,1:8],data[4,1:8],data[4,1:8]))/data[4,9]
expected

##      Strong Democrat Not Str Democrat Ind,Near Dem Independent
## [1,]           26.90            26.22        18.91       26.67
## [2,]          122.14           119.07        85.85      121.11
## [3,]           89.97            87.71        63.24       89.21
##      Ind,Near Rep Not Str Republican Strong Republican Other Party
## [1,]        11.59              19.25             14.07       4.389
## [2,]        52.64              87.39             63.88      19.930
## [3,]        38.77              64.37             47.05      14.681

That is the table with the expected values for each case.

To see the difference between the two can be seen in the following table:

observed -expected

##               partyid
## confinan       Strong Democrat Not Str Democrat Ind,Near Dem Independent
##   A Great Deal         -8.8989            3.776       -4.908       4.326
##   Only Some            -0.1354           -7.069      -11.852      -7.113
##   Hardly Any            9.0342            3.293       16.760       2.787
##               partyid
## confinan       Ind,Near Rep Not Str Republican Strong Republican
##   A Great Deal       -1.592              5.754             3.932
##   Only Some           3.364             12.614            14.122
##   Hardly Any         -1.772            -18.369           -18.053
##               partyid
## confinan       Other Party
##   A Great Deal      -2.389
##   Only Some         -3.930
##   Hardly Any         6.319

From here we can see that there are fairly big differences between the observed and the expected counts. Large deviations from what is expected provide evidence in favour of the alternative hypothesis. To quantify that we have to compute the chi-square statistic. That is done by simply squaring all those deviations, dividing the result by the expected values, and in the end summing up those values into one number which is the test statistic. That can be seen with the following code:

chi_stat <- sum(((observed-expected)^2)/expected)
chi_stat

## [1] 38.74

We see that the test statistic is 38.7385. To find whether there is evidence against our null hypothesis, we have to find the p-value associated with that statistic. The p-value is simply the probability of observing a result as extreme as suggested by that statistic, given that the null hypothesis is true. We use for that purpose the chi-square distribution, where the degreees of freedom are equal to (nrows-1)*(ncols-1), however in the code “-2” is used, because our data is with margins (i.e. totals).

df=(nrow(data)-2)*(ncol(data)-2)
p_value <- pchisq(chi_stat,df,lower.tail=F)
p_value

## [1] 0.0004002

As a result our p-value is 4.0024 × 10^-4 - pretty small - so we have convincing evidence to reject the null hypothesis and conclude that based on this test there is dependence between the two variables.

Quick inference for the year 2010

Here the same calculations are made for the year 2010.

gss_subset2010 <- subset(gss,select=c("partyid", "confinan"),year == 2010)
data2010 <- addmargins(table(subset(gss_subset2010,select=c("confinan","partyid"))))
chi_stat2010 <- sum(((data2010[1:3,1:8]-data2010[1:3,9]*(rbind(data2010[4,1:8],data2010[4,1:8],data2010[4,1:8]))/data2010[4,9])^2)/(data2010[1:3,9]*(rbind(data2010[4,1:8],data2010[4,1:8],data2010[4,1:8]))/data2010[4,9]))
p_value2010 <- pchisq(chi_stat2010,df,lower.tail=F)
p_value2010

## [1] 0.3036

The p-value for 2010 is 0.3036, which is high (more that the significance level of 5%) and hence there is not enugh evidence to reject the null hypothesis, so we stick with the null that the two variables are actually independent of each other.

Conclusion:

Base on the data from the General Social Survey for the year 2012, we can conclude that stated party affiliation and the expressed trust in the financial system are related or dependent variables. So, can we then generalize that the Democratically leaning voters distrust banks more that the Republican leaning - probably more research should be done to make this conclusion, although theoretically we should generalize that for the US population as a whole. There could be a confoundig variable that causes the difference.

There is an interesting twist here - for the year 2010 we cannot reject the null hypothesis, and hence it appears that the two variables are independent of each other. So, for the year just after the finacial crisis we can conclude that the trust in the financial institutions is just the same, regardless of the party association of the respondents in the survey. Another way to interpret that is that maybe all respondents equally distrust the finacilal system.

References

Data used in this study can be obtained through the following link GSS Data.
A Codebook with all the variables from the General Social Survey and descriptions about how they are constructed, what values can take and so on, can be found with the following link Codebook