Introduction
Racism is a worldwide problem that has affected different cultures around the planet. Nowadays, marginalization is no longer tolarable, but it is still an issue that impacts our culture. Back in time, people with black skin were discriminated at the point they were not allowed to vote or have a role in society different from slavery. In 1950, Martin Luther King started a fight against this problem in the United States, in orider to protect black people’s Human Rights. Since then, racism has strongly drecrased but it has not been completely erradicated from our society. An example of discrimination nowadays has to do with job offers; there are empoyers who do not hire people with black skin. Having said this, this Project aims to determine if there is a correlation between skin color and chances of getting a job in the United States. Por esto nos preguntamos Is there any relationship between race (race) and ease of getting an equivalent job (jobfind) in United States“? A continuación realizaremos el estudio de esta pregunta.
To conduct the study we will use a General Social Survey Cumulative File data base given by Coursera. The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. The survey is conducted face-to-face with an in-person interview by the National Opinion Research Center at the University of Chicago, of adults (18+) in randomly selected households. The data collected about this survey includes both demographic information and respondents’ opinions on matters ranging from government spending to the state of race relations to the existence and nature of God. Because of the wide range of topics covered, and the comprehensive gathering of demographic information, survey results allow social scientists to correlate demographic factors like age, race, gender, and urban/rural upbringing with beliefs. The survey was conducted every year from 1972 to 2012.You can finde it here:https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fgss1.html.
The cases for this study will be all the individual people because all the information comes from the persons the GSS survey. The two variables I will be studying to answer the research question stated earlier are:
This study is a study where researchers simply collect data based on what is seen and heard and infer based on the data collected and therefore is an observational study.
This Project is intended to reach people from different cultures (ethnics, relegion, etc), since racism is a problem that affects society as a whole. However, the results of this study may be of special interest to the American black skin citizens, since they will show if there is a statistical relation between skin color and time needed to get a job. At the end of the study, it will be posible to determine if American society is more likely to hire White skin or black skin employees.
Data
Now its time to star with data analysis. First we load the dataset
load(url("http://bit.ly/dasi_gss_data"))
Now we show the summary for the two variables we will be working with
summary(gss$race)
## White Black Other
## 46350 7926 2785
These is the race of respondent of the survey. People were asked if they were white colour skin, black colour skin or other.
summary(gss$unemp)
## Yes No NA's
## 10990 24517 21554
These is the standard of living, ever unemployed in the last 10 years.
table(gss$unemp, gss$race)
##
## White Black Other
## Yes 8636 1710 644
## No 20905 2666 946
Total of white people \(n_1=29541\). Proportion of white people unemployed \(p_1=29.23\)%.
Total of black people \(n_2=4376\). Proportion of black people unemployed \(p_2=39.08\)%.
In the next graph I plot the race vs unemployement
plot(gss$race ~ gss$unemp)
The following chart shows basically the same information but in a different way
barplot(table(gss$unemp, gss$race))
The graphic above suggests that there is a relation between skin color and unemployment. At first sight, it is posible to dietrmine that White people have been employed for more time tan other races. However, it is still not posible to draw strong conclusions with this information
Inference
We will use the hypothesis method. The null hypothesis states there is no relationship between black color and chances of getting a job. The alternative hypothesis states there is a relationship, , i.e, the proportion of unemployed black people is different from the proportion of unemployed white people.
\(H_0:p_{white}=p_{black}\)
\(H_A:p_{white}\neq p_{black}\)
We will use the method “Hypotesis test for comparing two proportions”.
Before we continue we have to check conditions and calculate the test statistic. As we dont know whats equal the \(p\)’s of the population; we will use the “pooled proportion”. It is given by
\(\hat{p}_{pool}=\)(# of successes white + # of successes black)/(\(n_1+n_2\))
\(\hat{p}_{pool}=\frac{8636+1710}{29541+4376}=0.305\)
We will call “success” if the individual had been unemployed.
The independence condition is met since we have random samples (GSS samples) and \(n_1,n_2\) are less than the 10% of all white and black population respectively. So we have independence within groups.
There is no reason to expect sampled white and black people to be dependent, so the sample is independent between groups too.
Thes sample size/skew condition is met for both, white and black grops Success-Failure Condition
\(n_1*\hat{p}_{pool}=29541*0.305\geq 10\)
\(n_1*(1-\hat{p}_{pool})=29541*0.695\geq 10\)
\(n_2*\hat{p}_{pool}=4376*0.305\geq 10\)
\(n_2*(1-\hat{p}_{pool})=4376*0.695\geq 10\)
So we can assume that the sampling distribution of the difference between the two sampled proportions is nearly normal.
Now, we are going to conduct a Hypothesis test at 5% significance level evaluating if white and black people are equally likely to get hired in a job or to be unemployed.
\(H_0:p_{white}-p_{black}=0\)
\(H_A:p_{white}-p_{black}\neq 0\)
\((\hat{p}_{white}-\hat{p}_{white})\)~\(N(mean=0,SE)\)
The standar error for the pooled proportion is
\(SE=\sqrt{\frac{\hat{p}_{pool}(1-\hat{p}_{pool})}{n_1}+\frac{\hat{p}_{pool}(1-\hat{p}_{pool})}{n_2}}=\sqrt{5.56*10^{-5}}=7.46*10^{-3}\)
Our point estimate in this case is
\(\hat{p}_{white}-\hat{p}_{black}=0.2923-0.3908=-0.0985\)
So, we finally have everything we need to calculate aur p-value
mean=0; sd=0.00746
lb=-0.0985; ub=0.0985
x <- seq(-0.1,0.1,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
plot(x, hx, type="n",main="Normal Distribution", axes=FALSE)
Observation: I´m sorry for the plot, but i tried a lot of times and none of them plote. Sorry!!
The \(Z\) value is
\(Z=\frac{-0.0985-0}{7.46*10^{-3}}=-13.20\)
Then the p-value can be described as the absolute value of the z score being beyon 13.20, i.e.
\(p-value=P(\mid z\mid >13.2)=0\)
As the p-value is very small, we can reject the null hypothesis. There is a difference between the proportions of unemployed white and black people. Thissupported by the results of the hypothesis test