library(mosaic)
The following table shows the total number of admitted and rejected applicants to the six largest departments in Berkeley in 1973.
| Admitted | Rejected | |
|---|---|---|
| Male | 1198 | 1493 |
| Female | 557 | 1278 |
admit <- matrix(c(1198,557,1493,1278),2,2)
rownames(admit) <- c("Male", "Female")
colnames(admit) <- c("Admitted", "Rejected")
admit <- as.table(admit)
Your analysis should as a minimum contain:
H0- There is no discrimination between genders being admitted Ha - There is discrimination between genders being admitted
Xsq = chisq.test(admit)
Xsq$expected
## Admitted Rejected
## Male 1043.4611 1647.539
## Female 711.5389 1123.461
This is the numbers there should be admitted and rejected if males and females had the same rates of admitance.
expected <- matrix(c(1043.461,711.5389,1647.5389,1123.4611),2,2)
rownames(expected) <- c("Male", "Female")
colnames(expected) <- c("Admitted", "Rejected")
expected <- as.table(expected)
chi = (admit[1,1] - expected[1,1])^2/expected[1,1] + (admit[2,1] - expected[2,1])^2/expected[2,1] + (admit[1,2] - expected[1,2])^2/expected[1,2] + (admit[2,2] - expected[2,2])^2/expected[2,2]
'Discrimination'
## [1] "Discrimination"
chi
## [1] 92.20533
We want to measure the discrimination via the Chi-squared statistic.
degree of freedom = (row-1)(col-1) = 1
pchisq(chi, 1, lower.tail = FALSE)
## [1] 7.813412e-22
This p value is very small. This means that we can reject the H0, and conclude that there is a gender discrimination.
A more detailed data set with the admissions for each department is available on the course web page. The variables are:
Admit (admitted/rejected)Gender (male/female)Dept (department A, B, C, D, E, F)Freq (freqency of each combination)Load the data into RStudio:
admission <- read.table("http://asta.math.aau.dk/dan/static/datasets?file=admission.dat", header=TRUE)
Using this data set you have to:
model1 = glm(Freq ~ Dept*Gender*Admit, family=poisson, data = admission)
model1
##
## Call: glm(formula = Freq ~ Dept * Gender * Admit, family = poisson,
## data = admission)
##
## Coefficients:
## (Intercept) DeptB
## 4.48864 -1.65542
## DeptC DeptD
## 0.81963 0.38656
## DeptE DeptF
## 0.05466 -1.31058
## GenderMale AdmitRejected
## 1.74969 -1.54420
## DeptB:GenderMale DeptC:GenderMale
## 1.28357 -2.27046
## DeptD:GenderMale DeptE:GenderMale
## -1.69763 -2.32269
## DeptF:GenderMale DeptB:AdmitRejected
## -1.83670 0.79043
## DeptC:AdmitRejected DeptD:AdmitRejected
## 2.20464 2.16617
## DeptE:AdmitRejected DeptF:AdmitRejected
## 2.70135 4.12505
## GenderMale:AdmitRejected DeptB:GenderMale:AdmitRejected
## 1.05208 -0.83205
## DeptC:GenderMale:AdmitRejected DeptD:GenderMale:AdmitRejected
## -1.17700 -0.97009
## DeptE:GenderMale:AdmitRejected DeptF:GenderMale:AdmitRejected
## -1.25226 -0.86318
##
## Degrees of Freedom: 23 Total (i.e. Null); 0 Residual
## Null Deviance: 2650
## Residual Deviance: 1.874e-13 AIC: 207.1
drop1 to test whether the model can be simplified.drop1(model1)
## Single term deletions
##
## Model:
## Freq ~ Dept * Gender * Admit
## Df Deviance AIC
## <none> 0.000 207.06
## Dept:Gender:Admit 5 20.204 217.26
It has a deviance of zeroa because it does not deviate at all. The drop1 test shows that we can leave out dept:gender:admit, as it has an insignificant p-value
Dept*Gender + Dept*Admit.model2 = glm(Freq ~ Dept*Gender + Dept*Admit, family=poisson, data = admission)
model2
##
## Call: glm(formula = Freq ~ Dept * Gender + Dept * Admit, family = poisson,
## data = admission)
##
## Coefficients:
## (Intercept) DeptB DeptC
## 4.24232 -1.48155 1.09523
## DeptD DeptE DeptF
## 0.60476 0.35202 -1.15268
## GenderMale AdmitRejected DeptB:GenderMale
## 2.03325 -0.59346 1.07581
## DeptC:GenderMale DeptD:GenderMale DeptE:GenderMale
## -2.63462 -1.92709 -2.75479
## DeptF:GenderMale DeptB:AdmitRejected DeptC:AdmitRejected
## -1.94356 0.05059 1.20915
## DeptD:AdmitRejected DeptE:AdmitRejected DeptF:AdmitRejected
## 1.25833 1.68296 3.26911
##
## Degrees of Freedom: 23 Total (i.e. Null); 6 Residual
## Null Deviance: 2650
## Residual Deviance: 21.74 AIC: 216.8
Now we remove department A by removing the rows 1,2,3,4 (assuming the data is in a data.frame called admission and that department A is in the first four rows):
noA <- admission[-(1:4),]
satModel = glm(Freq ~ Dept*Gender*Admit, family=poisson, data = noA)
Use this reduced dataset to conduct a new analysis which as a minimum contains. - (13) Succesive removal of model terms using drop1 and update starting from the saturated model.
In the end, the Intercept is Females in Dept. B and that are admitted. Intercept is our Alpha. We know this because this is a comparison of factors based on a base (Females, Dept.B, admitted). That is why we are left with the rest of the departments, males and rejected.