The Effect of Gender on Salary

I will explore the effect of Gender on Salary after 1995 at Houston College of Medicine. A few years ago, female doctors at the College claimed the instituion engaged in gender discrimination in setting salaries. The dataset used for this weeek’s assignment was presented to the United States District Court of Houston. The female doctors wanted to show that female faculty were earning less money than men, on average. The dataset was attained on Kaggle.com

Dependent variable:

  1. Salary after 1995

Independent variabe:

  1. Gender (1=Male, 0=Female)

Department-level variable:

  1. Dept (1=Biochemistry/Molecular Biology 2=Physiology 3=Genetics 4=Pediatrics 5=Medicine 6=Surgery)

Packages

library(readr)
library(dplyr)
library(pander)
library(texreg)
library(ggplot2)
library(lme4)
library(nlme)

The above packages will be used for this week’s assignment.

Importing dataset

gender<-read_csv("C:/Users/wroni/OneDrive/Documents/QC MADASR/SOC 712/gender.csv")

head(gender)

This is a quick preview of the dataset used.

Ecological Analysis

Deptd <- gender %>% 
  group_by(Dept) %>% 
  summarise(mean_p = mean(Sal95, na.rm = TRUE), mean_s = mean(Gender, na.rm = TRUE))
head(Deptd)
ecoreg <- lm(mean_s ~ mean_p, data = Deptd)
summary(ecoreg)
## 
## Call:
## lm(formula = mean_s ~ mean_p, data = Deptd)
## 
## Residuals:
##         1         2         3         4         5         6 
##  0.132708  0.054096 -0.029829 -0.183476 -0.007085  0.033586 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 2.877e-01  1.113e-01   2.586   0.0610 .
## mean_p      1.735e-06  6.193e-07   2.801   0.0488 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1186 on 4 degrees of freedom
## Multiple R-squared:  0.6623, Adjusted R-squared:  0.5779 
## F-statistic: 7.846 on 1 and 4 DF,  p-value: 0.04876

First, I will conduct an ecological analysis, which is a department-level analysis. The analysis above shows that at the deparment-level, there is a significant positive relationship between Gender and Salary. However, we cannot say that Gender affects Salary because this would be an ecological fallacy since we are not using Gender-level analysis. We need to condcut Pooling analysis.

Complete-Pooling Model

cpooling <- lm(Sal95 ~ Gender, data = gender)
summary(cpooling)
## 
## Call:
## lm(formula = Sal95 ~ Gender, data = gender)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -135991  -60711  -17327   44700  277675 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   130877       8077   16.20  < 2e-16 ***
## Gender         64037      10481    6.11 3.64e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 83160 on 259 degrees of freedom
## Multiple R-squared:  0.126,  Adjusted R-squared:  0.1226 
## F-statistic: 37.33 on 1 and 259 DF,  p-value: 3.643e-09

Using the complete-pooling method (Gender-level analysis), we can see that there is a significant positive relationship between Gender and Salary. Overall, being a Male increases Salary by 64,037. However, this analysis does not include Deparment-level variable. This model is flawed, and further analysis needs to be conducted.

No-Pooling Model

The Intercept

dcoef <- gender %>% 
    group_by(Dept) %>% 
    do(mod = lm(Sal95 ~ Gender, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()

Using the no-pooling method (Department-level analysis), we can see that the Salary varies between Departments. The Salary can get higher than 250,000 but can be lower than 100,000. Most Deparment are towards the 100,000 Salary.

The Slope

dcoef <- gender %>%
group_by(Dept) %>%
do(mod= lm(Sal95 ~ Gender, data= .))
coef <- dcoef %>% do(data.frame(Genderc = coef (.$mod)[2]))
ggplot(coef, aes( x = Genderc)) + geom_histogram()

The above shows the difference in Salary between Gender across Departments. The difference can be less than 20,000 or greater than 40,000. Most of the difference seems to be towards less than 20,000. However, this analysis is still flawed. This analysis does not impose a structure between-department variation.

Partial-Pooling

Random Intercept

m1_lme <- lme(Sal95 ~ Gender, data = gender, random = ~1|Dept, method = "ML")
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: gender 
##        AIC      BIC    logLik
##   6359.338 6373.596 -3175.669
## 
## Random effects:
##  Formula: ~1 | Dept
##         (Intercept) Residual
## StdDev:    74084.96 44088.87
## 
## Fixed effects: Sal95 ~ Gender 
##                 Value Std.Error  DF  t-value p-value
## (Intercept) 145651.25 30688.734 254 4.746082       0
## Gender       28440.58  5857.688 254 4.855257       0
##  Correlation: 
##        (Intr)
## Gender -0.109
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.3977784 -0.6307456 -0.1498175  0.5570775  3.4276404 
## 
## Number of Observations: 261
## Number of Groups: 6

With the partial-pooling method, I am combining the variation between Deparments from the no-pooling method and the variation between Gender from the complete-pooling method. The intercept shows that Females earn 145,651.25 on average and Males earn 28,440.58 more than Females on average. The standard deviation between Departments for Female Salary is 74,084.96.

Random Slope

m2_lme <- lmer(Sal95 ~ Gender + (Gender|Dept), data= gender)
## boundary (singular) fit: see ?isSingular
summary(m2_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Sal95 ~ Gender + (Gender | Dept)
##    Data: gender
## 
## REML criterion at convergence: 6308.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4431 -0.6312 -0.2008  0.4970  3.3798 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr
##  Dept     (Intercept) 5.666e+09 75272        
##           Gender      5.910e+07  7687    1.00
##  Residual             1.945e+09 44105        
## Number of obs: 261, groups:  Dept, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   144426      31048   4.652
## Gender         29039       6630   4.380
## 
## Correlation of Fixed Effects:
##        (Intr)
## Gender 0.375 
## convergence code: 0
## boundary (singular) fit: see ?isSingular

We can see that the Salary for Female across Departments is 144,426 with a standard deviation of 75, 272. Salary for Male across Department is 29,039 higher, with a standard deviation of 7,687.

Model Selection

AIC(cpooling, m1_lme, m2_lme)

Based on the lowest AIC results, the random slope model seems to the best fit.

Intra-Class Correlation

m0_lme <- lme(Sal95 ~ 1, random = ~ 1|Dept, data = gender, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: gender 
##        AIC      BIC    logLik
##   6380.073 6390.767 -3187.036
## 
## Random effects:
##  Formula: ~1 | Dept
##         (Intercept) Residual
## StdDev:    77875.46 46044.76
## 
## Fixed effects: Sal95 ~ 1 
##              Value Std.Error  DF  t-value p-value
## (Intercept) 161820  32004.69 255 5.056132       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.1749606 -0.6715005 -0.1370858  0.5888108  3.3605279 
## 
## Number of Observations: 261
## Number of Groups: 6

Here, we try to see if Salary is an Gender-Level or Department-Level thing.

77875.46/(77875.46+46044.76)=0.62852352

About 62.8% of the total variation in Salary can be attributed to Department-Level influences. The other 37.2% can be attributed to Gender-Level.

intervals(m0_lme)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                lower   est.    upper
## (Intercept) 98913.63 161820 224726.3
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: Dept 
##                    lower     est.    upper
## sd((Intercept)) 44007.34 77875.46 137808.6
## 
##  Within-group standard error:
##    lower     est.    upper 
## 42217.16 46044.76 50219.39

Conclusion

The intial ecological analysis showed that there was an effect between Gender and Salary, however we could not conclude that to be true because the analysis was department level. After doing several pooling anlysis and intra-class correlation, the results showed the effect to be true.