Introduction:

This reports is the result of the project of the Coursera Curse: “Data Analysis and Statistical Inference”

Is there a relationship between the gender of a person and the political consideration (liberal/conservative) of oneself?

Data:

The data have been extracted from the General Social Survey (GSS): A sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. There are a total of 57,061 cases and 114 variables in this dataset. Note that this is a cumulative data file for surveys conducted between 1972 - 2012 and that not all respondents answered all questions in all years. For this study only two fields have been chosen: Respondent’s sex (gss$sex) and political view (gss$polview)

RESPONDENTS SEX

Code respondent’s sex

1 MALE

2 FEMALE

sex <- gss$sex[!is.na(gss$polviews)]
table(sex, useNA = "ifany")
## sex
##   Male Female 
##  21386  26490

We can see that no missing values are provided

POLVIEWS

THINK OF SELF AS LIBERAL OR CONSERVATIVE

We hear a lot of talk these days about liberals and conservatives. I’m going to show you a seven-point scale on which the political views that people might hold are arranged from extremely liberal - point 1 - to extremely conservative - point. Where would you place yourself on this scale?

polviews <- gss$polviews[!is.na(gss$polviews)]
table(polviews, useNA = "ifany")
## polviews
##     Extremely Liberal               Liberal      Slightly Liberal 
##                  1330                  5582                  6181 
##              Moderate Slightly Conservative          Conservative 
##                 18494                  7691                  7092 
##  Extrmly Conservative 
##                  1506

Exploratory data analysis:

male <- table(polviews[sex=="Male"])/length(polviews[sex=="Male"])
female <- table(polviews[sex=="Female"])/length(polviews[sex=="Female"])
cbind(male,female)
##                          male  female
## Extremely Liberal     0.02918 0.02665
## Liberal               0.11433 0.11842
## Slightly Liberal      0.12859 0.12952
## Moderate              0.35883 0.40846
## Slightly Conservative 0.17605 0.14821
## Conservative          0.15856 0.13971
## Extrmly Conservative  0.03446 0.02903
par(mfrow=c(1,2))
barplot(male, xlab ="Political View", main="Male Political View", ylab="Relative frequency",las=2, ylim = c(0.,0.40))
barplot(female, xlab ="Political View", main="Female Political View", ylab="Relative frequency",las=2, ylim = c(0.,0.40))

plot of chunk unnamed-chunk-4

par(mfrow=c(1,1))
par(mfrow=c(1,2))
diffs <- male - female
plot(table(polviews, sex), xlab="Political View %", ylab="Sex", main="Political View according to sex", las=2)
barplot(diffs, xlab ="Differences in Political View", main="Male - Female", ylab="Relative frequency",las=2)

plot of chunk unnamed-chunk-5

par(mfrow=c(1,1))

Analysis:

We will try to analyze two categorical variabels, gender and Political View. The first one with two levels (male and female); the second one with 5 variables.

Total <- table(polviews)
Real <- cbind(table(polviews, sex), Total)
Total <- c(table(sex, useNA = "no"), "Total" = length(polviews))
Real <- rbind(Real, Total)
Real
##                        Male Female Total
## Extremely Liberal       624    706  1330
## Liberal                2445   3137  5582
## Slightly Liberal       2750   3431  6181
## Moderate               7674  10820 18494
## Slightly Conservative  3765   3926  7691
## Conservative           3391   3701  7092
## Extrmly Conservative    737    769  1506
## Total                 21386  26490 47876

Observing the table above: Does there appear to be a relationship between gender and Political View?

Hypotheses

H0 (nothing going on): Gender and Political View are independent. Political View rates do not vary by Gender.

HA (something going on): Gender and Political are dependent. Political View rates do vary by relationship status

As we are trying to evaluate the relationship between two categorical variables (at least one with more than 2 levels), we will perform a chi-square tests of independence. It consists in three steps:

  1. Quantify how different the observed counts are from the expected counts
  2. large deviations from what would be expected based on sampling variation (chance) alone provide strong evidence for the alternative hypothesis
  3. called an independence test since we’re evaluating the relationship between two categorical variables

Conditions for the chi-square test:

  1. Independence: Sampled observations must be independent.

    • random sample/assignment
    • if sampling without replacement, n < 10% of population
    • each case only contributes to one cell in the table
  2. Sample size: Each particular scenario (i.e. cell) must have at least 5 expected cases.

We met all the conditions for the chi-sqare test

Calculus

The observed overall male rate of the sample is:

male_rate = sum(sex == "Male") /length(polviews)
male_rate
## [1] 0.4467

If in fact gender and political views are independent (i.e. if in fact H0 is true) how many of male people would we expect to have liberal political views? How many moderate or conservative? Let’s calculte it according to the rates of males and females:

Expected <- Real
Expected[1:7,1] <- Real[1:7,3]*Real[8,1]/Real[8,3]
Expected[1:7,2] <- Real[1:7,3]*Real[8,2]/Real[8,3]
Expected
##                          Male  Female Total
## Extremely Liberal       594.1   735.9  1330
## Liberal                2493.5  3088.5  5582
## Slightly Liberal       2761.0  3420.0  6181
## Moderate               8261.2 10232.8 18494
## Slightly Conservative  3435.5  4255.5  7691
## Conservative           3168.0  3924.0  7092
## Extrmly Conservative    672.7   833.3  1506
## Total                 21386.0 26490.0 47876
CHI2 <- sum((Real[1:7,1] - Expected[1:7,1])**2/Expected[1:7,1])
df <- (length(levels(sex)) -1) * (length(levels(polviews)) -1)
CHI2
## [1] 97.67
df
## [1] 6
pchisq(CHI2, df, lower.tail = FALSE)
## [1] 7.7e-19

Conclusion:

Insert conclusion here…