Summary

A basic analysis of the effect of gender composition of faculty on the performance of students suggests that a greater share of male faculty is associated with greater percentage of students remaining enrolled in their university after admittance (referred to as “student yield”). First, a simple model is assembled. Student yield is regressed on only the share of faculty who are male. This first model suggested a substantial influence of male faculty on student yield - a one percent increase in the share is associated with a 100% increase in student yield. This first model proved to be suffer from a clear case of omitted variable bias.

Further analysis of the data revealed that factors associated with student aptitude (measured by academic ranking and Ivy League dummy) and financial burden (measured by tuition) were also associated with student yield. The final model, with the use of those 3 control variables, diminished the coefficient on male share of faculty by over 40%. Richer data would likely further reduce the amount of OVB in the model.

Exploratory analysis

The share of male professors proved to be correlated with other variables which could also explain student performance. The following plot demonstrates relationships in the data that are important for assessing influences on student performance. Male professor share does appear to have a positive relationship with student yield. Also notable: the highlighted Ivy League schools are all positioned in the upper right portion of the plot. Ivy League schools have both a high proportion of male professors and high student yield. Intuitively, this seems logical. These schools are more selective and choose especially gifted students. Students accepted into Ivy League schools have a strong track record, so it makes sense that they would remain admitted for longer than students at non-Ivy League schools. By and large, students who attend these school also come from affluent backgrounds and may have more support (financial and otherwise) to complete their degrees.

This insight encourages the use of a dummy variable for Ivy League as one potential control. It also suggests that there are other variables that align with both male faculty share and student yield.

While the Ivy League indicator proved promising, it only isolates the differences among a small number of schools. A more general indicator of student strength is academic ranking. Schools with a better ranking are more competitive and, like Ivy League schools, are more selective. On average, students who get into high-ranking schools may have the academic and family backgrounds that enable them to complete their degrees.

Before examining the relationship between academic ranking and student yield, the rankings were manipulated such that the schools with the best ranking (typically denoted as low numbers) now had the highest score. This makes interpretation more straightforward. The below graph suggests there is a relationship between male professor share and academic ranking as well as yield and academic ranking.

For a more complete view of how the different variables vary together, a correlation matrix was used. These data were useful for identifying the different control variables to be tested. Other indicators considered include tuition (incorporated from 2015 dept. of education data), student-faculty ratio, average SAT score, and average professor salary. Each were expected to have a minimizing effect on OVB for their correlation with student yield and/or share of male professors.

## 
## Descriptive Statistics
## ==========================================================================
## Statistic                     Mean      St. Dev.   Min     Max    Median  
## --------------------------------------------------------------------------
## StudentYield                 36.214      13.516     10     84       35    
## male_prof_share               0.607      0.044    0.490   0.745    0.606  
## IvyLeague                     0.071      0.259      0       1        0    
## Tuition_thousands            33.070      14.003   13.101 53.000   26.094  
## Student_faculty_ratio        15.245      5.026      5      27       16    
## SATMath75_Freshmen_average   698.684     60.069    570     800      690   
## ProfSalary_average         137,104.200 25,784.600 85,824 202,464 131,251.5
## AcceptanceRate               48.378      23.658     6      93       51    
## female_econprof_share         0.226      0.087    0.043   0.412    0.218  
## male_econprof_share           0.774      0.087    0.588   0.957    0.782  
## female_econdegreeshare        0.292      0.087    0.111   0.532    0.299  
## female_prof_share             0.393      0.044    0.255   0.510    0.394  
## male_freshmen_share           0.478      0.043    0.315   0.598    0.478  
## AcRanking                    51.194      27.827     1      98      50.5   
## --------------------------------------------------------------------------

Adding each successive control variable to the regression (results below) lowered the coefficient on male professor share. The addition of the Ivy League dummy, academic ranking, and tuition variables caused the coefficient on male professor share to fall by approximately 43% in model 4. The variable with the greatest effect on student yield, outside of the male professor share, was the Ivy League dummy variable. In the fourth model, being at an Ivy League university is associated with an increase in student yield of 25.6%, holding all else equal. This proved to be the most useful control. Another encouraging sign in our fourth model is the adjusted r-squared of 0.39 - this is nearly four times higher than the first model.

## 
## =======================================================================================
##                                             Dependent variable:                        
##                     -------------------------------------------------------------------
##                                                StudentYield                            
##                           (1)              (2)              (3)              (4)       
## ---------------------------------------------------------------------------------------
## male_prof_share        101.809***       73.462***         59.345**         57.127**    
##                         (29.215)         (25.334)         (26.238)         (25.777)    
## IvyLeague                               26.528***        23.517***        25.573***    
##                                          (4.354)          (4.617)          (4.633)     
## AcRanking                                                  0.081*          0.101**     
##                                                           (0.045)          (0.045)     
## Tuition_thousands                                                          -0.176**    
##                                                                            (0.082)     
## Constant                -25.613          -10.293           -5.648           0.316      
##                         (17.789)         (15.370)         (15.409)         (15.383)    
## ---------------------------------------------------------------------------------------
## Observations               98               98               98               98       
## R2                       0.112            0.362            0.383            0.412      
## Adjusted R2              0.103            0.348            0.363            0.386      
## Residual Std. Error 12.801 (df = 96) 10.911 (df = 95) 10.785 (df = 94) 10.587 (df = 93)
## =======================================================================================
## Note:                                                       *p<0.1; **p<0.05; ***p<0.01

While the changes in fit of the model from version 1 to version 4 are clear, improvements could be made. OVB is still apparent in these results. Other variables related to student performance, like SAT scores and student-faculty ratio, were considered but ultimately discarded as they were not statistically significant. This was surprising. Repeating this study on additional data sets using SAT scores and student-faculty ratios would be useful. Data on family income and financial aid would potentially improve this model’s fit. Tuition alone appears to have a small effect on student yield compared to other variables. The price paid per student (after financial aid) and the student’s family income could be indicative of their ability to continue in school financially. These variables could prove to be stronger determinants of student yield. Other factors that could influence student yield are more social. A student with stronger social ties (measured by data like number of student clubs) may be more likely to remain enrolled compared to a student with limited connections. The addition of these variables may further minimize OVB and the coefficient on male share of faculty.