Gender Gap

The gender gap at many higher education instituions has been declining in recent years. Currently, among schools reporting data and primarily offering 4 year undergraduate degrees, the average female attendance rate is 59.46%.

To show this another way, here is the distribution of female attendance rates for these same schools.


Gender Gap at the University of Massachusetts-Lowell

The University of Massachusetts-Lowell falls below the national average of 59.46%. Only 43.44% of UMASS-Lowell students are female, 16.02% below the national average.

Demonstrated Graphically:

The fact that UMASS-Lowell’s female attendance rate significantly lags behind the national average suggests that it is unlikely this difference occurred randomly. With this observation in mind, the next question to consider is what systemic factors may account for this difference. Something about UMASS-Lowell makes it a less competitive option for female students. Isolating which differences account for the deficit of female students will help focus attention on possible options to correct this trend.

When? and why? these are the two main questions that must be answered in order to understand UMASS-Lowell’s gender gap.

When in the admissions process is UMASS-Lowell losing female applicants?

Why is UMASS-Lowell losing these applicants?

In this analysis I will only focus on possible factors for why UMASS-Lowell is losing female students. The factors impacting gender gaps nationally can be broken up into two large categories; systematic and individual. Systematic causes are things that can be compared from school to school. They are deterministic. For example, university X has a low gender gap because of program Y. College A has a higher gender gap because it is located in region B. Individual causes are more coincidental and cannot be tested by a model. For instance, school C has a high gender gap because of scandal Z. I will only focus on systematic causes.

To understand potential systematic causes I will frame UMASS-Lowell’s gender gap in terms of three possible explanatory hypotheses. Modeling these three hypothesese, I will isolate the unique attributes of UMASS-Lowell which may begin to account for its gender gap.

The three hypotheses I will use are:

The institutional apporach will consider the fixed aspects of UMASS-Lowell which might affect its gender gap (i.e. whether it is public or private, the region of the country it is located in, or the level of urbanziation in a school’s location).

The student body approach will analyze to what extent the student body that UMASS-Lowell currently attracts is necessarily a more male dominated population. This method will highlight demographic disparities between UMASS-Lowell and other higher educational institutions which may account for UMASS-Lowell’s gender gap. This model will highlight communities from which an increased marketing focus may produce more female enrollment.

The programmatic approach will compare degree program participation rates across the country with female attendance rate. With this method, I will seek to understand how certain types of programs attract a larger female student body. I will compare the national rates of programs that are highly correlated with high female enrollment rates and UMASS-Lowell’s degree programs. Similar to the student body approach, this method will highlight what programmatic areas UMASS-Lowell lacks for which an increased focus may result in higher female enrollment.

The model derived from these hypothetical explanations for UMASS-Lowell’s gender gap does not seek to demonstrate causation. Rather, they highlight areas that are highly correlated with high female attendance rates. The purpose of this study is to emphasize areas of interest. Further research is necessary to determine the efficacy that focusing on these areas may have in increasing female enrollment. At the end of this study I will include a series of next step recommendations.

Finally, I will look at both a generalized (national) model, and a model that is restricted to only UMASS-Lowell’s cohort. This serves two important purposes. First, it will show if specific factors affecting female enrollment persist on both a national and localized level. Second, it will show to what extent UMASS-Lowell’s cohort helps to account for its gender gap.

Modeling the Gender Gap

Designing the Model

The variables used to develop this regression describe the three hypotheses. Each hypothesis can be broken into a set of variables as follows.

Institutional Approach:

  1. Enrollment

  2. Admission Rate

  3. Public vs. Not for Profit Private vs. For Profit Private

  4. Region of the United States

  5. Local Setting

  6. Commuter vs. Residential

  7. Religious vs. Non-religious

  8. Transfer in Rate

Student Body Approach:

  1. Average SAT score

  2. Average age of student upon entry

  3. % Part-time students

  4. % First generation

  5. % non-white

  6. Natural Log of Family Income

Programmatic Approach:

  1. Ethnic/Cultural/Gender/Studies

  2. Computer, Information Sciences, and Support Services

  3. Engineering

  4. Engineering Technologies and Engineering-Related Fields

  5. Foreign Languages, Literatures, and Linguistics

  6. English Language and Literature/Letters

  7. Liberal Arts and Sciences, General Studies and Humanities

  8. Library Science

  9. Biological and Biomedical Sciences

  10. Mathematics and Statistics

  11. Multi/Interdisciplinary Studies

  12. Philosophy and Religious Studies

  13. Physical Sciences

  14. Psychology

  15. Homeland Security, Law Enforcement, Firefighting, and Related Protective Services

  16. Social Sciences

  17. Visual and Performing Arts

  18. Health Professions and Related Programs

  19. Business, Management, Marketing, and Related Support Services

  20. History

I will compare each unique hypothesis with the other hypotheses using the adjusted R-squared for each model in order to determine which hypothesis has the overall greatest impact on the gender gap. Next, I will compare the individual components of each hypothesis against one another by including them all in one large model and comparing the coefficients of the significant variables. I will then measure how UMASS-Lowell compares to other schools based on the variables which play a significant role in the gender gap in order to determine areas UMASS-Lowell can improve. Finally, I will take the same regression model and narrow the observations I analyze to only those schools within UMASS-Lowell’s cohort. I will then determine which of variables of interest are still impactful at the localized level. Additionally, I will compare UMASS-Lowell to other schools in its cohort to see if it remains an outlier based on its gender gap.


Results

Summary

The results show several different factors from each hypothesis are impactful in explaining variation in gender gaps across institutions in the United States. Though these variables of interest do not demonstrate causation, they are highly correlated with variance in the gender gap. I will discuss each area at length below and compare them to UMASS-Lowell.

The most impactful hypothesis was the programatic approach. The types of programs offered at a school are most highly correlated with variation in the gender gap. The next most impactful hypothesis was the student body approach. The institutional approach was least correlated with variation in the gender gap, although it still accounted for a noticeable ammount of variation.

Next, we need to consider the individual elements more and see what factors in each category are influential.

The Programatic Approach

In order, the programs that had the greatest positive affect on female attendance rates (were correlated with higher attendance) to the programs with the greatest negative affect (were correlated with a lower attendance) were:

Greatest Positive Impact

  1. English Language and Literature/Letters
  2. Health Professions and Related Programs
  3. Psychology
  4. Visual and Performing Arts
  5. Biological and Biomdedical Sciences
  6. Liberal Arts and Sciences, General Studies and Humanities

Neutral (Least Positive Impact / Least Negative Impact)

  1. History
  2. Engineering Technologies and Engineering
  3. Philosophy and Religious Studies
  4. Engineering
  5. Computer and Information Sciences and Support Services

Greatest Negative Impact

UMASS-Lowell compared by programs to the national average:

UMASS-Lowell’s involvement rate for programs correlated with higher female enrollment is similar to the national average. It is more competitive in areas that are correlated with lower female enrollment. This difference is significant. It is likely then that programatic offerings at UMASS-Lowell play a role in describing its gender gap. Specifially, the computer sciences and engineering programs are very popular at UMAS Lowell, but unpopular with female students.

The Student Body Approach

Four student body descriptors seem to be correlated variation in the gender gap. In order of impact they are:

Factor Impact Rank
Higher Rate of Non-white Students Higher Female Enrollment 1st
Higher Rate of Part-time Students Higher 2nd
Higher Average Age of Entry Higher 3rd
Higher SAT Average Higher 4th

UMASS-Lowell compared by student body to the national average:

Higher diversity is correlated with a higher rate of female enrollment. UMASS-Lowell is less diverse than the average school in the United States. This is the most impactful student body descriptor on female enrollment. For all other significant impactors UMASS-Lowell is at or above the national average, therefore it does not appear these factors are impacting female enrollment.

Institutional Approach

Several institutional factors were correlated with variance in female attendance.

Factor Impact UMASS-Lowell
Private School Negative No
Region: Mid East Positive No
Region: Southeast Positive No
Region: Rocky Mountains Negative No
Region: Far West Positive No
All other regions: no significant impact
Urban/Suburban/Rural Positive/Neutral/Negative Neutral/Positive
Higher Admission Rate Positive -

UMASS-Lowell’s female attendance rate does not seem to be significantly impacted by its institutional characteristics. Though institutional charactersistics do play a role when comparing all schools, these factors do not appear to be impacting UMASS-Lowell (either positively or negatively).


Conclusion & Recommendations

The research contained above does not attempt to understand what causes lower female attendance. Rather, I attempt to determine what factors are correlated with decreased female attendance, and which of these characteristics UMASS-Lowell should investigate further in attempting to correct its own gender gap. I compared three main hypotheses to determine which was most impactful on a national level in determining female enrollment. My research suggests that programatic offerings are the most correlated with fluctuations in female attendance rates followed by student body makeup and institutional factors. Of these hypotheses UMASS-Lowell had high rates of participation in several programs that are highly correlated on a national level with reduced female enrollment. For programs correlated with higher female attendance UMASS-Lowell was average when compared nationally. Looking at student body factors, female attendance is correlated with more diversity, more part-time students, a higher average age of entry, and higher average SAT scores. Higher diversity rates was the most highly correlated with higher female attendance and it was the area UMASS-Lowell struggled the most in. Finally, UMASS-Lowell was average for most institutional factors. It does not appear UMASS-Lowell’s gender gap is heavily influenced by its instituional factors like region, ownership, or level of urbanization.

Next Steps

  • UMASS-Lowell should seek to understand if diversity and its programatic offerings are the causes of its lower rate of female attendance or just correlated.
  • UMASS-Lowell should seek to increase its diversity and female attendance rate in tandem as the two are related.
  • UMASS-Lowell should seek to offer more programs correlated with higher female enrollment.
  • UMASS-Lowell should seek to market its programs that are traditionally male dominated to female students.
  • UMASS-Lowell should investigate what factors make engineering and computer science programs more attractive to female students.

Regression and Statistical Appendix

gender_gap_lm <- lm(FEMALE~. -FEMALE.x-FEMALE.y-INSTNM, data = gender_gap_data)

summary(gender_gap_lm)
## 
## Call:
## lm(formula = FEMALE ~ . - FEMALE.x - FEMALE.y - INSTNM, data = gender_gap_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37806 -0.03913  0.00322  0.03987  0.35989 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -3.490e-01  1.944e-01  -1.796 0.072804 .  
## UGDS                -4.181e-07  3.902e-07  -1.071 0.284199    
## ADM_RATE             6.594e-02  1.412e-02   4.669 3.36e-06 ***
## CONTROL2            -1.827e-02  7.795e-03  -2.344 0.019239 *  
## CONTROL3             1.084e-02  2.764e-02   0.392 0.694929    
## REGION2              2.144e-02  9.655e-03   2.220 0.026580 *  
## REGION3              1.076e-02  1.035e-02   1.040 0.298618    
## REGION4             -7.527e-03  1.109e-02  -0.678 0.497593    
## REGION5              2.171e-02  1.018e-02   2.133 0.033148 *  
## REGION6             -5.865e-03  1.235e-02  -0.475 0.635054    
## REGION7             -3.051e-02  1.529e-02  -1.996 0.046198 *  
## REGION8              2.310e-02  1.177e-02   1.964 0.049811 *  
## REGION9              6.951e-02  4.277e-02   1.625 0.104380    
## LOCALE12            -1.099e-02  7.898e-03  -1.392 0.164162    
## LOCALE13            -1.071e-02  7.815e-03  -1.370 0.170903    
## LOCALE21            -2.576e-02  7.203e-03  -3.575 0.000363 ***
## LOCALE22            -2.029e-02  1.303e-02  -1.558 0.119562    
## LOCALE23            -3.045e-02  1.484e-02  -2.051 0.040442 *  
## LOCALE31            -5.010e-02  1.112e-02  -4.506 7.25e-06 ***
## LOCALE32            -2.921e-02  8.441e-03  -3.461 0.000557 ***
## LOCALE33            -3.997e-02  9.808e-03  -4.075 4.89e-05 ***
## LOCALE41            -5.029e-02  1.357e-02  -3.705 0.000221 ***
## LOCALE42            -8.142e-02  2.162e-02  -3.766 0.000173 ***
## LOCALE43            -4.026e-02  2.182e-02  -1.845 0.065290 .  
## COMMUTERresidential  7.081e-03  6.523e-03   1.086 0.277845    
## RELIGIOUSreligious  -7.671e-03  6.682e-03  -1.148 0.251195    
## TRANSFER_INlow       9.679e-03  5.320e-03   1.819 0.069102 .  
## PCIP05              -1.908e-02  3.241e-01  -0.059 0.953054    
## PCIP11              -5.308e-01  6.828e-02  -7.774 1.60e-14 ***
## PCIP14              -3.638e-01  3.099e-02 -11.742  < 2e-16 ***
## PCIP15              -2.183e-01  9.841e-02  -2.218 0.026724 *  
## PCIP16               1.607e-02  2.110e-01   0.076 0.939305    
## PCIP23               7.472e-01  1.368e-01   5.463 5.66e-08 ***
## PCIP24               7.694e-02  3.480e-02   2.211 0.027227 *  
## PCIP26               1.158e-01  5.434e-02   2.131 0.033273 *  
## PCIP27               9.309e-02  2.150e-01   0.433 0.665147    
## PCIP30               9.795e-02  5.676e-02   1.726 0.084673 .  
## PCIP38              -2.204e-01  9.807e-02  -2.248 0.024757 *  
## PCIP40              -1.559e-01  1.518e-01  -1.027 0.304618    
## PCIP42               1.958e-01  5.448e-02   3.594 0.000338 ***
## PCIP43              -2.128e-02  4.499e-02  -0.473 0.636268    
## PCIP45               1.551e-03  5.239e-02   0.030 0.976387    
## PCIP50               1.625e-01  2.756e-02   5.896 4.80e-09 ***
## PCIP51               2.759e-01  2.130e-02  12.955  < 2e-16 ***
## PCIP52              -8.692e-03  2.523e-02  -0.345 0.730520    
## PCIP54              -1.534e-01  1.953e-01  -0.785 0.432560    
## PPTUG_EF             7.168e-02  3.125e-02   2.294 0.021982 *  
## SAT_AVG              1.371e-04  3.502e-05   3.915 9.53e-05 ***
## AGE_ENTRY            1.432e-02  1.641e-03   8.724  < 2e-16 ***
## FIRST_GEN           -1.950e-02  4.936e-02  -0.395 0.692849    
## NON_WHITE            9.092e-02  1.669e-02   5.447 6.17e-08 ***
## logFAMINC            3.156e-02  1.622e-02   1.946 0.051929 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07309 on 1226 degrees of freedom
##   (973 observations deleted due to missingness)
## Multiple R-squared:  0.5619, Adjusted R-squared:  0.5436 
## F-statistic: 30.83 on 51 and 1226 DF,  p-value: < 2.2e-16
plot(gender_gap_lm)

instituional_lm <- lm(FEMALE~. -INSTNM, data = inst_controls_2)

summary(instituional_lm)
## 
## Call:
## lm(formula = FEMALE ~ . - INSTNM, data = inst_controls_2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49277 -0.05371  0.00474  0.06039  0.38808 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          3.002e-01  1.119e-01   2.684 0.007350 ** 
## UGDS                -9.374e-07  4.788e-07  -1.958 0.050411 .  
## ADM_RATE             3.973e-02  1.506e-02   2.638 0.008416 ** 
## CONTROL2             1.341e-02  9.287e-03   1.444 0.148832    
## CONTROL3             6.737e-02  1.584e-02   4.254 2.23e-05 ***
## REGION1              3.260e-01  1.122e-01   2.905 0.003728 ** 
## REGION2              3.252e-01  1.119e-01   2.906 0.003715 ** 
## REGION3              3.092e-01  1.121e-01   2.757 0.005902 ** 
## REGION4              3.000e-01  1.123e-01   2.671 0.007643 ** 
## REGION5              3.247e-01  1.120e-01   2.899 0.003795 ** 
## REGION6              2.926e-01  1.124e-01   2.604 0.009314 ** 
## REGION7              2.565e-01  1.134e-01   2.262 0.023843 *  
## REGION8              3.223e-01  1.122e-01   2.873 0.004125 ** 
## REGION9              2.906e-01  1.136e-01   2.558 0.010612 *  
## LOCALE12            -1.204e-02  1.050e-02  -1.146 0.251914    
## LOCALE13            -2.131e-02  9.977e-03  -2.136 0.032841 *  
## LOCALE21            -2.796e-02  9.002e-03  -3.106 0.001932 ** 
## LOCALE22            -3.363e-02  1.660e-02  -2.026 0.042950 *  
## LOCALE23            -1.931e-02  2.005e-02  -0.963 0.335651    
## LOCALE31            -5.454e-02  1.601e-02  -3.407 0.000674 ***
## LOCALE32            -5.453e-02  1.102e-02  -4.947 8.37e-07 ***
## LOCALE33            -5.107e-02  1.343e-02  -3.801 0.000150 ***
## LOCALE41            -4.163e-02  1.796e-02  -2.318 0.020566 *  
## LOCALE42            -7.918e-02  2.495e-02  -3.173 0.001537 ** 
## LOCALE43            -3.352e-02  3.075e-02  -1.090 0.275851    
## COMMUTERresidential -4.587e-02  7.628e-03  -6.014 2.26e-09 ***
## RELIGIOUSreligious   7.401e-03  7.972e-03   0.928 0.353395    
## TRANSFER_INlow      -1.579e-02  6.239e-03  -2.530 0.011494 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1111 on 1532 degrees of freedom
##   (553 observations deleted due to missingness)
## Multiple R-squared:  0.1186, Adjusted R-squared:  0.103 
## F-statistic: 7.633 on 27 and 1532 DF,  p-value: < 2.2e-16
programatic_lm <- lm(FEMALE~.-INSTNM, data = programs)

summary(programatic_lm)
## 
## Call:
## lm(formula = FEMALE ~ . - INSTNM, data = programs)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.51814 -0.04726  0.00479  0.05226  0.42812 
## 
## Coefficients: (1 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.506038   0.009055  55.888  < 2e-16 ***
## PCIP05       0.738557   0.309036   2.390  0.01695 *  
## PCIP11      -0.132532   0.032839  -4.036 5.65e-05 ***
## PCIP14      -0.345290   0.027288 -12.654  < 2e-16 ***
## PCIP15      -0.388494   0.092301  -4.209 2.68e-05 ***
## PCIP16       0.134374   0.203992   0.659  0.51015    
## PCIP23       0.513068   0.129438   3.964 7.64e-05 ***
## PCIP24       0.095072   0.022843   4.162 3.29e-05 ***
## PCIP26       0.024745   0.039982   0.619  0.53606    
## PCIP27      -0.346416   0.220971  -1.568  0.11712    
## PCIP30       0.191170   0.037700   5.071 4.34e-07 ***
## PCIP38      -0.131965   0.050277  -2.625  0.00874 ** 
## PCIP40       0.029291   0.149801   0.196  0.84499    
## PCIP42       0.200817   0.028806   6.971 4.29e-12 ***
## PCIP43       0.078830   0.035220   2.238  0.02532 *  
## PCIP45       0.101860   0.044925   2.267  0.02348 *  
## PCIP50       0.111406   0.016224   6.867 8.82e-12 ***
## PCIP51       0.314181   0.012934  24.290  < 2e-16 ***
## PCIP52       0.133392   0.015839   8.422  < 2e-16 ***
## PCIP54      -0.919698   0.206975  -4.444 9.35e-06 ***
## PREDDEG            NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09472 on 1933 degrees of freedom
##   (160 observations deleted due to missingness)
## Multiple R-squared:  0.4182, Adjusted R-squared:  0.4124 
## F-statistic: 73.12 on 19 and 1933 DF,  p-value: < 2.2e-16
student_body_lm <- lm(FEMALE~.-INSTNM, data = demographics)

summary(student_body_lm)
## 
## Call:
## lm(formula = FEMALE ~ . - INSTNM, data = demographics)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46126 -0.04759  0.00420  0.05736  0.42579 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.003e-01  2.004e-01  -1.998 0.045978 *  
## PPTUG_EF     1.130e-01  3.718e-02   3.038 0.002433 ** 
## SAT_AVG     -7.701e-05  3.083e-05  -2.498 0.012620 *  
## AGE_ENTRY    1.402e-02  1.998e-03   7.015 3.85e-12 ***
## FIRST_GEN    1.787e-02  5.657e-02   0.316 0.752138    
## NON_WHITE    8.432e-02  1.671e-02   5.046 5.20e-07 ***
## logFAMINC    6.294e-02  1.668e-02   3.774 0.000168 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09943 on 1190 degrees of freedom
##   (916 observations deleted due to missingness)
## Multiple R-squared:  0.1426, Adjusted R-squared:  0.1382 
## F-statistic: 32.98 on 6 and 1190 DF,  p-value: < 2.2e-16