Setup

Load packages

## Warning: package 'ggplot2' was built under R version 4.0.2
## Warning: package 'dplyr' was built under R version 4.0.2

Load data

Make sure your data and R Markdown files are in the same directory. When loaded your data file will be called gss. Delete this note when before you submit your work.


Part 1: Data

It is possible to obtain the following information on the official website: “Since 1972, the General Social Survey (GSS) has provided politicians, policymakers, and scholars with a clear and unbiased perspective on what Americans think and feel about such issues as national spendi ng priorities, crime and punishment , intergroup relations, and confidence in institutions”. (https://gss.norc.org/)

The database contains: a extract of the General Social Survey (GSS) Cumulative File 1972-2012 provides a sample of selected indicators

The survey is done by National Opinion Research Center (NORC), founded in 1941.

According to website documents: “The data comes from General Social Surveys, interviews administered to national NORC samples using a standard questionnaire”

The results are not possible for causality. Why not? Since there is no random assignment of sample subjects, the data cannot be used to answer causation questions.

Since the sample is national, it is possible to generalize. It is possible to make inferences.


Part 2: Research question

In the search for a reference question, I took a look at the codebook and came across two variables that I found interesting:

  1. Race> since my master’s degree was about racial quotas in Brazil. (https://docplayer.com.br/110810534-Gregorio-unbehaun-leal-da-silva.html)

  2. Trust in banks and financial institutions> I thought this was opportune, due to research by researcher Rosana Pinheiro-Machado on bank credit. This renowned researcher is a fellow of the UK Higher Education Academy. I read this researcher a lot and I think that her research on access to financial services and lower classes is very interesting. (http://rosanapinheiromachado.com.br/en/academic-papers-english/)

Combining these two interests of mine, I thought it opportune to investigate in the GSS, if there is any relation between the interviewee’s race and his view about the financial and banking system. I do not think it is possible given the nature of the data (see Part 1) to be able to analyze causality, so our focus will be on the perception that it is reliable to say that the race of Americans impacts or not the way it evaluates banks and financial institutions.

As soon as googleed this relationship I found an article from August 13, 2019 by Reuters giving a survey that points out the relationship between the variables of interest and in this way I thought it prudent to continue the analysis.

This link (https://www.reuters.com/article/us-usa-banks-race/african-americans-underserved-by-u-s-banks-study-idUSKCN1V3081), provides this information:

“Many African Americans have difficulty accumulating savings in part because they lack access to mainstream financial services like banking, a new study on the contributing factors to the U.S. racial wealth gap by McKinsey & Co found on Tuesday.”

To test this question we will take into account only the White and Black comparison. We will dismiss with subset function those opinions that are not strongly negative and positive. Opinions that have an average confidence level are not interested in this Lab, given the learning in this module.

HO : pblack = pwhite

HA: pblack ≠ pwhite

Variables : CONFINAN & RACE


Part 3: Exploratory data analysis

The first thing I did was create ‘q3’ by isolating the data I want to analyze and choosing only the year 2012, the most recent data. I also chose to omit unanswered data (NA´s).
We chose to binarize the responses in “White” and “Black”. And “A great deal” v. “Hardly Any”. We therefore exclude the other races from the analysis. We also do not take into account those who have an intermediate opinion at the level of trust (neither trust nor distrust). This generates a smaller sample, but allows us to focus on the two races of interest, as well as positions only strongly favorable and strongly unfavorable to our variable to be investigated (trust in banks and the financial system).

## [1] 592   3

Above you can see the data with only two confidence levels, after executing the droplevels function.

In the race variable, it was not necessary to execute the function, but it was also left with only two variables.

592 cases with 3 variables. Next step is to see a summary of the data ‘q3’:

##       year         race             confinan  
##  Min.   :2012   White:496   A Great Deal:135  
##  1st Qu.:2012   Black: 96   Hardly Any  :457  
##  Median :2012                                 
##  Mean   :2012                                 
##  3rd Qu.:2012                                 
##  Max.   :2012

The data above indicate that our data selection was successful, obtaining data only from 2012 and with race and confidence level with only two categories that interest our research question.

Next we do a prop table:

##               
##                    White     Black
##   A Great Deal 0.2177419 0.2812500
##   Hardly Any   0.7822581 0.7187500

The way to interpret the table above is that, for each column (‘Great Deal’ and ‘Hardly One’) what is the proportion of white and black respondents. In other words, the column adds up to 1.

An easier graphic representation can be seen below:

The table above visually indicates more clearly that there seems to be some association, the inference will make it possible to answer with more certainty.

To conclude this EDA section, the following script will calculate the necessary sample statistics:

## [1] 96
## [1] 496

The first number is the total of blacks in ‘q3’, and the second number is the number of whites in ‘q3’. It was already possible to see these numbers above when I performed the summary function (q3). Another step is calculate the number os sucess (Note: I consider sucess = High Confidence Level), and the sample proportions for each race.

## [1] 0.281
## [1] 0.218

The association seems to be more positive on the part of blacks, given that their p ^ hat is higher (0.281). But our alternative hypothesis (HA) is that pblack is different from pwhite. The hypothesis always talks about the population and so far we only have the p ^ hats that are estimated points from a sample, once the inference is made, it will be possible to 95 % confidence, check the range of this data. As the difference in p ^ hats is small, there may not be a pass in the necessary tests and the inference will check whether this occurs or not.


Part 4: Inference

Remember: HO : pblack = pwhite

HA: pblack ≠ pwhite

To test this now we gonna inference.

Confidence Level = 95%

Having defined our confidence level, we will ask the program to make the inference.

Inference can be made in two ways, the first of which will be shown below Method One: We can say that our H0 is Pblack - Pwhite = 0 Since it is about comparing two binary and categorical variables at a 95% confidence interval and verifying that the zero value is in the interval. We can then answer the question: There is convincing evidence that Black and Whites has seen a difference in its confidence of Bank and Financial System? True or False - in this method one, our response tends to FALSE see below:

## [1] 0.1599809
## [1] -0.03398088

Our conclusion was due to the fact that the value 0 falls within the generated interval. (-0.03, 0.15). I believe that this method is ** 95% confident ** that there is no association. In method 2 we will generate a p-value.

First Step: Checks Conditions for Hypothesis Test Comparing Two Proportions • We have two simple random samples from large populations. Here “large” means that the population is at least 20 times larger than the size of the sample. • The individuals in our samples have been chosen independently of one another. The populations themselves must also be independent. • There are at least 10 successes and 10 failures in both of our samples. (In both Blacks and Whites – see below)

## [1] 27
## [1] 69
## [1] 108
## [1] 388

All of the above values are greater than 10. The last condition is given, so we can move on to inference

## Response variable: categorical (2 levels, success: A Great Deal)
## Explanatory variable: categorical (2 levels) 
## n_White = 496, p_hat_White = 0.2177
## n_Black = 96, p_hat_Black = 0.2812
## H0: p_White =  p_Black
## HA: p_White != p_Black
## z = -1.3575
## p_value = 0.1746

As expected, data from method 2 confirm the findings from method one. With such a high p-value it is not possible to reject the null hypothesis

Therefore, as the p-value test did not pass, we assumed H0. For approval, the value should be less than 0.05. In both method 1 and method 2, the result was the same. This gives us security for the conclusion.

We went on to say that there is no security in the data, to affirm that race is an indicator of trust or not in banks and financial institutions