Exam 5

library(mosaic)

The following table shows the total number of admitted and rejected applicants to the six largest departments in Berkeley in 1973.

Admitted Rejected
Male 1198 1493
Female 557 1278
  1. Use a \(\chi^2\)-test to check whether the admission statistics for Berkeley show any sign of gender discrimination. To enter the table in R you can do:
admit <- matrix(c(1198,557,1493,1278),2,2)
rownames(admit) <- c("Male", "Female")
colnames(admit) <- c("Admitted", "Rejected")
admit <- as.table(admit)

Your analysis should as a minimum contain:

H0- There is no discrimination between genders being admitted Ha - There is discrimination between genders being admitted

Xsq = chisq.test(admit)
Xsq$expected
##         Admitted Rejected
## Male   1043.4611 1647.539
## Female  711.5389 1123.461

This is the numbers there should be admitted and rejected if males and females had the same rates of admitance.

expected <- matrix(c(1043.461,711.5389,1647.5389,1123.4611),2,2)
rownames(expected) <- c("Male", "Female")
colnames(expected) <- c("Admitted", "Rejected")
expected <- as.table(expected)

chi = (admit[1,1] - expected[1,1])^2/expected[1,1] + (admit[2,1] - expected[2,1])^2/expected[2,1] + (admit[1,2] - expected[1,2])^2/expected[1,2] + (admit[2,2] - expected[2,2])^2/expected[2,2]
'Discrimination'
## [1] "Discrimination"
chi
## [1] 92.20533

We want to measure the discrimination via the Chi-squared statistic.

degree of freedom = (row-1)(col-1) = 1

pchisq(chi, 1, lower.tail = FALSE)
## [1] 7.813412e-22

This p value is very small. This means that we can reject the H0, and conclude that there is a gender discrimination.

A more detailed data set with the admissions for each department is available on the course web page. The variables are:

Load the data into RStudio:

admission <- read.table("http://asta.math.aau.dk/dan/static/datasets?file=admission.dat", header=TRUE)

Using this data set you have to:

model1 = glm(Freq ~ Dept*Gender*Admit, family=poisson, data = admission)
model1
## 
## Call:  glm(formula = Freq ~ Dept * Gender * Admit, family = poisson, 
##     data = admission)
## 
## Coefficients:
##                    (Intercept)                           DeptB  
##                        4.48864                        -1.65542  
##                          DeptC                           DeptD  
##                        0.81963                         0.38656  
##                          DeptE                           DeptF  
##                        0.05466                        -1.31058  
##                     GenderMale                   AdmitRejected  
##                        1.74969                        -1.54420  
##               DeptB:GenderMale                DeptC:GenderMale  
##                        1.28357                        -2.27046  
##               DeptD:GenderMale                DeptE:GenderMale  
##                       -1.69763                        -2.32269  
##               DeptF:GenderMale             DeptB:AdmitRejected  
##                       -1.83670                         0.79043  
##            DeptC:AdmitRejected             DeptD:AdmitRejected  
##                        2.20464                         2.16617  
##            DeptE:AdmitRejected             DeptF:AdmitRejected  
##                        2.70135                         4.12505  
##       GenderMale:AdmitRejected  DeptB:GenderMale:AdmitRejected  
##                        1.05208                        -0.83205  
## DeptC:GenderMale:AdmitRejected  DeptD:GenderMale:AdmitRejected  
##                       -1.17700                        -0.97009  
## DeptE:GenderMale:AdmitRejected  DeptF:GenderMale:AdmitRejected  
##                       -1.25226                        -0.86318  
## 
## Degrees of Freedom: 23 Total (i.e. Null);  0 Residual
## Null Deviance:       2650 
## Residual Deviance: 1.874e-13     AIC: 207.1
drop1(model1)
## Single term deletions
## 
## Model:
## Freq ~ Dept * Gender * Admit
##                   Df Deviance    AIC
## <none>                  0.000 207.06
## Dept:Gender:Admit  5   20.204 217.26

It has a deviance of zeroa because it does not deviate at all. The drop1 test shows that we can leave out dept:gender:admit, as it has an insignificant p-value

model2 = glm(Freq ~ Dept*Gender + Dept*Admit, family=poisson, data = admission)
model2
## 
## Call:  glm(formula = Freq ~ Dept * Gender + Dept * Admit, family = poisson, 
##     data = admission)
## 
## Coefficients:
##         (Intercept)                DeptB                DeptC  
##             4.24232             -1.48155              1.09523  
##               DeptD                DeptE                DeptF  
##             0.60476              0.35202             -1.15268  
##          GenderMale        AdmitRejected     DeptB:GenderMale  
##             2.03325             -0.59346              1.07581  
##    DeptC:GenderMale     DeptD:GenderMale     DeptE:GenderMale  
##            -2.63462             -1.92709             -2.75479  
##    DeptF:GenderMale  DeptB:AdmitRejected  DeptC:AdmitRejected  
##            -1.94356              0.05059              1.20915  
## DeptD:AdmitRejected  DeptE:AdmitRejected  DeptF:AdmitRejected  
##             1.25833              1.68296              3.26911  
## 
## Degrees of Freedom: 23 Total (i.e. Null);  6 Residual
## Null Deviance:       2650 
## Residual Deviance: 21.74     AIC: 216.8

Now we remove department A by removing the rows 1,2,3,4 (assuming the data is in a data.frame called admission and that department A is in the first four rows):

noA <- admission[-(1:4),]
satModel = glm(Freq ~ Dept*Gender*Admit, family=poisson, data = noA)

Use this reduced dataset to conduct a new analysis which as a minimum contains. - (13) Succesive removal of model terms using drop1 and update starting from the saturated model.

In the end, the Intercept is Females in Dept. B and that are admitted. Intercept is our Alpha. We know this because this is a comparison of factors based on a base (Females, Dept.B, admitted). That is why we are left with the rest of the departments, males and rejected.