This reports is the result of the project of the Coursera Curse: “Data Analysis and Statistical Inference”
Is there a relationship between the gender of a person and the political consideration (liberal/conservative) of oneself?
The data have been extracted from the General Social Survey (GSS): A sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. There are a total of 57,061 cases and 114 variables in this dataset. Note that this is a cumulative data file for surveys conducted between 1972 - 2012 and that not all respondents answered all questions in all years. For this study only two fields have been chosen: Respondent’s sex (gss$sex) and political view (gss$polview)
RESPONDENTS SEX
Code respondent’s sex
1 MALE
2 FEMALE
sex <- gss$sex[!is.na(gss$polviews)]
table(sex, useNA = "ifany")
## sex
## Male Female
## 21386 26490
We can see that no missing values are provided
POLVIEWS
THINK OF SELF AS LIBERAL OR CONSERVATIVE
We hear a lot of talk these days about liberals and conservatives. I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal - point 1 - to extremely conservative - point. Where would you place yourself on this scale?
polviews <- gss$polviews[!is.na(gss$polviews)]
table(polviews, useNA = "ifany")
## polviews
## Extremely Liberal Liberal Slightly Liberal
## 1330 5582 6181
## Moderate Slightly Conservative Conservative
## 18494 7691 7092
## Extrmly Conservative
## 1506
male <- table(polviews[sex=="Male"])/length(polviews[sex=="Male"])
female <- table(polviews[sex=="Female"])/length(polviews[sex=="Female"])
cbind(male,female)
## male female
## Extremely Liberal 0.02918 0.02665
## Liberal 0.11433 0.11842
## Slightly Liberal 0.12859 0.12952
## Moderate 0.35883 0.40846
## Slightly Conservative 0.17605 0.14821
## Conservative 0.15856 0.13971
## Extrmly Conservative 0.03446 0.02903
par(mfrow=c(1,2))
barplot(male, xlab ="Political View", main="Male Political View", ylab="Relative frequency",las=2, ylim = c(0.,0.40))
barplot(female, xlab ="Political View", main="Female Political View", ylab="Relative frequency",las=2, ylim = c(0.,0.40))
par(mfrow=c(1,1))
par(mfrow=c(1,2))
diffs <- male - female
plot(table(polviews, sex), xlab="Political View %", ylab="Sex", main="Political View according to sex", las=2)
barplot(diffs, xlab ="Differences in Political View", main="Male - Female", ylab="Relative frequency",las=2)
par(mfrow=c(1,1))
We will try to analyze two categorical variabels, gender and Political View. The first one with two levels (male and female); the second one with 5 variables.
Total <- table(polviews)
Real <- cbind(table(polviews, sex), Total)
Total <- c(table(sex, useNA = "no"), "Total" = length(polviews))
Real <- rbind(Real, Total)
Real
## Male Female Total
## Extremely Liberal 624 706 1330
## Liberal 2445 3137 5582
## Slightly Liberal 2750 3431 6181
## Moderate 7674 10820 18494
## Slightly Conservative 3765 3926 7691
## Conservative 3391 3701 7092
## Extrmly Conservative 737 769 1506
## Total 21386 26490 47876
Observing the table above: Does there appear to be a relationship between gender and Political View?
Hypotheses
H0 (nothing going on): Gender and Political View are independent. Political View rates do not vary by Gender.
HA (something going on): Gender and Political are dependent. Political View rates do vary by relationship status
As we are trying to evaluate the relationship between two categorical variables (at least one with more than 2 levels), we will perform a chi-square tests of independence. It consists in three steps:
Conditions for the chi-square test:
Independence: Sampled observations must be independent.
Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases.
We met all the conditions for the chi-sqare test
Calculus
The observed overall male rate of the sample is:
male_rate = sum(sex == "Male") /length(polviews)
male_rate
## [1] 0.4467
If in fact gender and political views are independent (i.e. if in fact H0 is true) how many of male people would we expect to have liberal political views? How many moderate or conservative? Let’s calculte it according to the rates of males and females:
Expected <- Real
Expected[1:7,1] <- Real[1:7,3]*Real[8,1]/Real[8,3]
Expected[1:7,2] <- Real[1:7,3]*Real[8,2]/Real[8,3]
Expected
## Male Female Total
## Extremely Liberal 594.1 735.9 1330
## Liberal 2493.5 3088.5 5582
## Slightly Liberal 2761.0 3420.0 6181
## Moderate 8261.2 10232.8 18494
## Slightly Conservative 3435.5 4255.5 7691
## Conservative 3168.0 3924.0 7092
## Extrmly Conservative 672.7 833.3 1506
## Total 21386.0 26490.0 47876
CHI2 <- sum((Real[1:7,1] - Expected[1:7,1])**2/Expected[1:7,1])
df <- (length(levels(sex)) -1) * (length(levels(polviews)) -1)
CHI2
## [1] 97.67
df
## [1] 6
pchisq(CHI2, df, lower.tail = FALSE)
## [1] 7.7e-19
Insert conclusion here…