I will explore the effect of Gender on Salary after 1995 at Houston College of Medicine. A few years ago, female doctors at the College claimed the instituion engaged in gender discrimination in setting salaries. The dataset used for this weeek’s assignment was presented to the United States District Court of Houston. The female doctors wanted to show that female faculty were earning less money than men, on average. The dataset was attained on Kaggle.com
Dependent variable:
Independent variabe:
Department-level variable:
Packages
library(readr)
library(dplyr)
library(pander)
library(texreg)
library(ggplot2)
library(lme4)
library(nlme)
The above packages will be used for this week’s assignment.
Importing dataset
gender<-read_csv("C:/Users/wroni/OneDrive/Documents/QC MADASR/SOC 712/gender.csv")
head(gender)
This is a quick preview of the dataset used.
Deptd <- gender %>%
group_by(Dept) %>%
summarise(mean_p = mean(Sal95, na.rm = TRUE), mean_s = mean(Gender, na.rm = TRUE))
head(Deptd)
ecoreg <- lm(mean_s ~ mean_p, data = Deptd)
summary(ecoreg)
##
## Call:
## lm(formula = mean_s ~ mean_p, data = Deptd)
##
## Residuals:
## 1 2 3 4 5 6
## 0.132708 0.054096 -0.029829 -0.183476 -0.007085 0.033586
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.877e-01 1.113e-01 2.586 0.0610 .
## mean_p 1.735e-06 6.193e-07 2.801 0.0488 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1186 on 4 degrees of freedom
## Multiple R-squared: 0.6623, Adjusted R-squared: 0.5779
## F-statistic: 7.846 on 1 and 4 DF, p-value: 0.04876
First, I will conduct an ecological analysis, which is a department-level analysis. The analysis above shows that at the deparment-level, there is a significant positive relationship between Gender and Salary. However, we cannot say that Gender affects Salary because this would be an ecological fallacy since we are not using Gender-level analysis. We need to condcut Pooling analysis.
cpooling <- lm(Sal95 ~ Gender, data = gender)
summary(cpooling)
##
## Call:
## lm(formula = Sal95 ~ Gender, data = gender)
##
## Residuals:
## Min 1Q Median 3Q Max
## -135991 -60711 -17327 44700 277675
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 130877 8077 16.20 < 2e-16 ***
## Gender 64037 10481 6.11 3.64e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 83160 on 259 degrees of freedom
## Multiple R-squared: 0.126, Adjusted R-squared: 0.1226
## F-statistic: 37.33 on 1 and 259 DF, p-value: 3.643e-09
Using the complete-pooling method (Gender-level analysis), we can see that there is a significant positive relationship between Gender and Salary. Overall, being a Male increases Salary by 64,037. However, this analysis does not include Deparment-level variable. This model is flawed, and further analysis needs to be conducted.
The Intercept
dcoef <- gender %>%
group_by(Dept) %>%
do(mod = lm(Sal95 ~ Gender, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()
Using the no-pooling method (Department-level analysis), we can see that the Salary varies between Departments. The Salary can get higher than 250,000 but can be lower than 100,000. Most Deparment are towards the 100,000 Salary.
The Slope
dcoef <- gender %>%
group_by(Dept) %>%
do(mod= lm(Sal95 ~ Gender, data= .))
coef <- dcoef %>% do(data.frame(Genderc = coef (.$mod)[2]))
ggplot(coef, aes( x = Genderc)) + geom_histogram()
The above shows the difference in Salary between Gender across Departments. The difference can be less than 20,000 or greater than 40,000. Most of the difference seems to be towards less than 20,000. However, this analysis is still flawed. This analysis does not impose a structure between-department variation.
Random Intercept
m1_lme <- lme(Sal95 ~ Gender, data = gender, random = ~1|Dept, method = "ML")
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: gender
## AIC BIC logLik
## 6359.338 6373.596 -3175.669
##
## Random effects:
## Formula: ~1 | Dept
## (Intercept) Residual
## StdDev: 74084.96 44088.87
##
## Fixed effects: Sal95 ~ Gender
## Value Std.Error DF t-value p-value
## (Intercept) 145651.25 30688.734 254 4.746082 0
## Gender 28440.58 5857.688 254 4.855257 0
## Correlation:
## (Intr)
## Gender -0.109
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.3977784 -0.6307456 -0.1498175 0.5570775 3.4276404
##
## Number of Observations: 261
## Number of Groups: 6
With the partial-pooling method, I am combining the variation between Deparments from the no-pooling method and the variation between Gender from the complete-pooling method. The intercept shows that Females earn 145,651.25 on average and Males earn 28,440.58 more than Females on average. The standard deviation between Departments for Female Salary is 74,084.96.
Random Slope
m2_lme <- lmer(Sal95 ~ Gender + (Gender|Dept), data= gender)
## boundary (singular) fit: see ?isSingular
summary(m2_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Sal95 ~ Gender + (Gender | Dept)
## Data: gender
##
## REML criterion at convergence: 6308.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4431 -0.6312 -0.2008 0.4970 3.3798
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Dept (Intercept) 5.666e+09 75272
## Gender 5.910e+07 7687 1.00
## Residual 1.945e+09 44105
## Number of obs: 261, groups: Dept, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 144426 31048 4.652
## Gender 29039 6630 4.380
##
## Correlation of Fixed Effects:
## (Intr)
## Gender 0.375
## convergence code: 0
## boundary (singular) fit: see ?isSingular
We can see that the Salary for Female across Departments is 144,426 with a standard deviation of 75, 272. Salary for Male across Department is 29,039 higher, with a standard deviation of 7,687.
AIC(cpooling, m1_lme, m2_lme)
Based on the lowest AIC results, the random slope model seems to the best fit.
m0_lme <- lme(Sal95 ~ 1, random = ~ 1|Dept, data = gender, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: gender
## AIC BIC logLik
## 6380.073 6390.767 -3187.036
##
## Random effects:
## Formula: ~1 | Dept
## (Intercept) Residual
## StdDev: 77875.46 46044.76
##
## Fixed effects: Sal95 ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 161820 32004.69 255 5.056132 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.1749606 -0.6715005 -0.1370858 0.5888108 3.3605279
##
## Number of Observations: 261
## Number of Groups: 6
Here, we try to see if Salary is an Gender-Level or Department-Level thing.
77875.46/(77875.46+46044.76)=0.62852352
About 62.8% of the total variation in Salary can be attributed to Department-Level influences. The other 37.2% can be attributed to Gender-Level.
intervals(m0_lme)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 98913.63 161820 224726.3
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: Dept
## lower est. upper
## sd((Intercept)) 44007.34 77875.46 137808.6
##
## Within-group standard error:
## lower est. upper
## 42217.16 46044.76 50219.39
The intial ecological analysis showed that there was an effect between Gender and Salary, however we could not conclude that to be true because the analysis was department level. After doing several pooling anlysis and intra-class correlation, the results showed the effect to be true.