The gender gap at many higher education instituions has been declining in recent years. Currently, among schools reporting data and primarily offering 4 year undergraduate degrees, the average female attendance rate is 59.46%.
To show this another way, here is the distribution of female attendance rates for these same schools.
The University of Massachusetts-Lowell falls below the national average of 59.46%. Only 43.44% of UMASS-Lowell students are female, 16.02% below the national average.
Demonstrated Graphically:
The fact that UMASS-Lowell’s female attendance rate significantly lags behind the national average suggests that it is unlikely this difference occurred randomly. With this observation in mind, the next question to consider is what systemic factors may account for this difference. Something about UMASS-Lowell makes it a less competitive option for female students. Isolating which differences account for the deficit of female students will help focus attention on possible options to correct this trend.
When? and why? these are the two main questions that must be answered in order to understand UMASS-Lowell’s gender gap.
When in the admissions process is UMASS-Lowell losing female applicants?
Why is UMASS-Lowell losing these applicants?
In this analysis I will only focus on possible factors for why UMASS-Lowell is losing female students. The factors impacting gender gaps nationally can be broken up into two large categories; systematic and individual. Systematic causes are things that can be compared from school to school. They are deterministic. For example, university X has a low gender gap because of program Y. College A has a higher gender gap because it is located in region B. Individual causes are more coincidental and cannot be tested by a model. For instance, school C has a high gender gap because of scandal Z. I will only focus on systematic causes.
To understand potential systematic causes I will frame UMASS-Lowell’s gender gap in terms of three possible explanatory hypotheses. Modeling these three hypothesese, I will isolate the unique attributes of UMASS-Lowell which may begin to account for its gender gap.
The three hypotheses I will use are:
Hypothesis 1: The Insitutional Approach
Hypothesis 2: The Student Body Approach
Hypothesis 3: The Programmatic Approach
The institutional apporach will consider the fixed aspects of UMASS-Lowell which might affect its gender gap (i.e. whether it is public or private, the region of the country it is located in, or the level of urbanziation in a school’s location).
The student body approach will analyze to what extent the student body that UMASS-Lowell currently attracts is necessarily a more male dominated population. This method will highlight demographic disparities between UMASS-Lowell and other higher educational institutions which may account for UMASS-Lowell’s gender gap. This model will highlight communities from which an increased marketing focus may produce more female enrollment.
The programmatic approach will compare degree program participation rates across the country with female attendance rate. With this method, I will seek to understand how certain types of programs attract a larger female student body. I will compare the national rates of programs that are highly correlated with high female enrollment rates and UMASS-Lowell’s degree programs. Similar to the student body approach, this method will highlight what programmatic areas UMASS-Lowell lacks for which an increased focus may result in higher female enrollment.
The model derived from these hypothetical explanations for UMASS-Lowell’s gender gap does not seek to demonstrate causation. Rather, they highlight areas that are highly correlated with high female attendance rates. The purpose of this study is to emphasize areas of interest. Further research is necessary to determine the efficacy that focusing on these areas may have in increasing female enrollment. At the end of this study I will include a series of next step recommendations.
Finally, I will look at both a generalized (national) model, and a model that is restricted to only UMASS-Lowell’s cohort. This serves two important purposes. First, it will show if specific factors affecting female enrollment persist on both a national and localized level. Second, it will show to what extent UMASS-Lowell’s cohort helps to account for its gender gap.
The variables used to develop this regression describe the three hypotheses. Each hypothesis can be broken into a set of variables as follows.
Institutional Approach:
Enrollment
Admission Rate
Public vs. Not for Profit Private vs. For Profit Private
Region of the United States
Local Setting
Commuter vs. Residential
Religious vs. Non-religious
Transfer in Rate
Student Body Approach:
Average SAT score
Average age of student upon entry
% Part-time students
% First generation
% non-white
Natural Log of Family Income
Programmatic Approach:
Ethnic/Cultural/Gender/Studies
Computer, Information Sciences, and Support Services
Engineering
Engineering Technologies and Engineering-Related Fields
Foreign Languages, Literatures, and Linguistics
English Language and Literature/Letters
Liberal Arts and Sciences, General Studies and Humanities
Library Science
Biological and Biomedical Sciences
Mathematics and Statistics
Multi/Interdisciplinary Studies
Philosophy and Religious Studies
Physical Sciences
Psychology
Homeland Security, Law Enforcement, Firefighting, and Related Protective Services
Social Sciences
Visual and Performing Arts
Health Professions and Related Programs
Business, Management, Marketing, and Related Support Services
History
I will compare each unique hypothesis with the other hypotheses using the adjusted R-squared for each model in order to determine which hypothesis has the overall greatest impact on the gender gap. Next, I will compare the individual components of each hypothesis against one another by including them all in one large model and comparing the coefficients of the significant variables. I will then measure how UMASS-Lowell compares to other schools based on the variables which play a significant role in the gender gap in order to determine areas UMASS-Lowell can improve. Finally, I will take the same regression model and narrow the observations I analyze to only those schools within UMASS-Lowell’s cohort. I will then determine which of variables of interest are still impactful at the localized level. Additionally, I will compare UMASS-Lowell to other schools in its cohort to see if it remains an outlier based on its gender gap.
The results show several different factors from each hypothesis are impactful in explaining variation in gender gaps across institutions in the United States. Though these variables of interest do not demonstrate causation, they are highly correlated with variance in the gender gap. I will discuss each area at length below and compare them to UMASS-Lowell.
The most impactful hypothesis was the programatic approach. The types of programs offered at a school are most highly correlated with variation in the gender gap. The next most impactful hypothesis was the student body approach. The institutional approach was least correlated with variation in the gender gap, although it still accounted for a noticeable ammount of variation.
Next, we need to consider the individual elements more and see what factors in each category are influential.
In order, the programs that had the greatest positive affect on female attendance rates (were correlated with higher attendance) to the programs with the greatest negative affect (were correlated with a lower attendance) were:
Greatest Positive Impact
Neutral (Least Positive Impact / Least Negative Impact)
Greatest Negative Impact
UMASS-Lowell compared by programs to the national average:
UMASS-Lowell’s involvement rate for programs correlated with higher female enrollment is similar to the national average. It is more competitive in areas that are correlated with lower female enrollment. This difference is significant. It is likely then that programatic offerings at UMASS-Lowell play a role in describing its gender gap. Specifially, the computer sciences and engineering programs are very popular at UMAS Lowell, but unpopular with female students.
Four student body descriptors seem to be correlated variation in the gender gap. In order of impact they are:
| Factor | Impact | Rank |
|---|---|---|
| Higher Rate of Non-white Students | Higher Female Enrollment | 1st |
| Higher Rate of Part-time Students | Higher | 2nd |
| Higher Average Age of Entry | Higher | 3rd |
| Higher SAT Average | Higher | 4th |
UMASS-Lowell compared by student body to the national average:
Higher diversity is correlated with a higher rate of female enrollment. UMASS-Lowell is less diverse than the average school in the United States. This is the most impactful student body descriptor on female enrollment. For all other significant impactors UMASS-Lowell is at or above the national average, therefore it does not appear these factors are impacting female enrollment.
Several institutional factors were correlated with variance in female attendance.
| Factor | Impact | UMASS-Lowell |
|---|---|---|
| Private School | Negative | No |
| Region: Mid East | Positive | No |
| Region: Southeast | Positive | No |
| Region: Rocky Mountains | Negative | No |
| Region: Far West | Positive | No |
| All other regions: no significant impact | ||
| Urban/Suburban/Rural | Positive/Neutral/Negative | Neutral/Positive |
| Higher Admission Rate | Positive | - |
UMASS-Lowell’s female attendance rate does not seem to be significantly impacted by its institutional characteristics. Though institutional charactersistics do play a role when comparing all schools, these factors do not appear to be impacting UMASS-Lowell (either positively or negatively).
The research contained above does not attempt to understand what causes lower female attendance. Rather, I attempt to determine what factors are correlated with decreased female attendance, and which of these characteristics UMASS-Lowell should investigate further in attempting to correct its own gender gap. I compared three main hypotheses to determine which was most impactful on a national level in determining female enrollment. My research suggests that programatic offerings are the most correlated with fluctuations in female attendance rates followed by student body makeup and institutional factors. Of these hypotheses UMASS-Lowell had high rates of participation in several programs that are highly correlated on a national level with reduced female enrollment. For programs correlated with higher female attendance UMASS-Lowell was average when compared nationally. Looking at student body factors, female attendance is correlated with more diversity, more part-time students, a higher average age of entry, and higher average SAT scores. Higher diversity rates was the most highly correlated with higher female attendance and it was the area UMASS-Lowell struggled the most in. Finally, UMASS-Lowell was average for most institutional factors. It does not appear UMASS-Lowell’s gender gap is heavily influenced by its instituional factors like region, ownership, or level of urbanization.
gender_gap_lm <- lm(FEMALE~. -FEMALE.x-FEMALE.y-INSTNM, data = gender_gap_data)
summary(gender_gap_lm)
##
## Call:
## lm(formula = FEMALE ~ . - FEMALE.x - FEMALE.y - INSTNM, data = gender_gap_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.37806 -0.03913 0.00322 0.03987 0.35989
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.490e-01 1.944e-01 -1.796 0.072804 .
## UGDS -4.181e-07 3.902e-07 -1.071 0.284199
## ADM_RATE 6.594e-02 1.412e-02 4.669 3.36e-06 ***
## CONTROL2 -1.827e-02 7.795e-03 -2.344 0.019239 *
## CONTROL3 1.084e-02 2.764e-02 0.392 0.694929
## REGION2 2.144e-02 9.655e-03 2.220 0.026580 *
## REGION3 1.076e-02 1.035e-02 1.040 0.298618
## REGION4 -7.527e-03 1.109e-02 -0.678 0.497593
## REGION5 2.171e-02 1.018e-02 2.133 0.033148 *
## REGION6 -5.865e-03 1.235e-02 -0.475 0.635054
## REGION7 -3.051e-02 1.529e-02 -1.996 0.046198 *
## REGION8 2.310e-02 1.177e-02 1.964 0.049811 *
## REGION9 6.951e-02 4.277e-02 1.625 0.104380
## LOCALE12 -1.099e-02 7.898e-03 -1.392 0.164162
## LOCALE13 -1.071e-02 7.815e-03 -1.370 0.170903
## LOCALE21 -2.576e-02 7.203e-03 -3.575 0.000363 ***
## LOCALE22 -2.029e-02 1.303e-02 -1.558 0.119562
## LOCALE23 -3.045e-02 1.484e-02 -2.051 0.040442 *
## LOCALE31 -5.010e-02 1.112e-02 -4.506 7.25e-06 ***
## LOCALE32 -2.921e-02 8.441e-03 -3.461 0.000557 ***
## LOCALE33 -3.997e-02 9.808e-03 -4.075 4.89e-05 ***
## LOCALE41 -5.029e-02 1.357e-02 -3.705 0.000221 ***
## LOCALE42 -8.142e-02 2.162e-02 -3.766 0.000173 ***
## LOCALE43 -4.026e-02 2.182e-02 -1.845 0.065290 .
## COMMUTERresidential 7.081e-03 6.523e-03 1.086 0.277845
## RELIGIOUSreligious -7.671e-03 6.682e-03 -1.148 0.251195
## TRANSFER_INlow 9.679e-03 5.320e-03 1.819 0.069102 .
## PCIP05 -1.908e-02 3.241e-01 -0.059 0.953054
## PCIP11 -5.308e-01 6.828e-02 -7.774 1.60e-14 ***
## PCIP14 -3.638e-01 3.099e-02 -11.742 < 2e-16 ***
## PCIP15 -2.183e-01 9.841e-02 -2.218 0.026724 *
## PCIP16 1.607e-02 2.110e-01 0.076 0.939305
## PCIP23 7.472e-01 1.368e-01 5.463 5.66e-08 ***
## PCIP24 7.694e-02 3.480e-02 2.211 0.027227 *
## PCIP26 1.158e-01 5.434e-02 2.131 0.033273 *
## PCIP27 9.309e-02 2.150e-01 0.433 0.665147
## PCIP30 9.795e-02 5.676e-02 1.726 0.084673 .
## PCIP38 -2.204e-01 9.807e-02 -2.248 0.024757 *
## PCIP40 -1.559e-01 1.518e-01 -1.027 0.304618
## PCIP42 1.958e-01 5.448e-02 3.594 0.000338 ***
## PCIP43 -2.128e-02 4.499e-02 -0.473 0.636268
## PCIP45 1.551e-03 5.239e-02 0.030 0.976387
## PCIP50 1.625e-01 2.756e-02 5.896 4.80e-09 ***
## PCIP51 2.759e-01 2.130e-02 12.955 < 2e-16 ***
## PCIP52 -8.692e-03 2.523e-02 -0.345 0.730520
## PCIP54 -1.534e-01 1.953e-01 -0.785 0.432560
## PPTUG_EF 7.168e-02 3.125e-02 2.294 0.021982 *
## SAT_AVG 1.371e-04 3.502e-05 3.915 9.53e-05 ***
## AGE_ENTRY 1.432e-02 1.641e-03 8.724 < 2e-16 ***
## FIRST_GEN -1.950e-02 4.936e-02 -0.395 0.692849
## NON_WHITE 9.092e-02 1.669e-02 5.447 6.17e-08 ***
## logFAMINC 3.156e-02 1.622e-02 1.946 0.051929 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07309 on 1226 degrees of freedom
## (973 observations deleted due to missingness)
## Multiple R-squared: 0.5619, Adjusted R-squared: 0.5436
## F-statistic: 30.83 on 51 and 1226 DF, p-value: < 2.2e-16
plot(gender_gap_lm)
instituional_lm <- lm(FEMALE~. -INSTNM, data = inst_controls_2)
summary(instituional_lm)
##
## Call:
## lm(formula = FEMALE ~ . - INSTNM, data = inst_controls_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49277 -0.05371 0.00474 0.06039 0.38808
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.002e-01 1.119e-01 2.684 0.007350 **
## UGDS -9.374e-07 4.788e-07 -1.958 0.050411 .
## ADM_RATE 3.973e-02 1.506e-02 2.638 0.008416 **
## CONTROL2 1.341e-02 9.287e-03 1.444 0.148832
## CONTROL3 6.737e-02 1.584e-02 4.254 2.23e-05 ***
## REGION1 3.260e-01 1.122e-01 2.905 0.003728 **
## REGION2 3.252e-01 1.119e-01 2.906 0.003715 **
## REGION3 3.092e-01 1.121e-01 2.757 0.005902 **
## REGION4 3.000e-01 1.123e-01 2.671 0.007643 **
## REGION5 3.247e-01 1.120e-01 2.899 0.003795 **
## REGION6 2.926e-01 1.124e-01 2.604 0.009314 **
## REGION7 2.565e-01 1.134e-01 2.262 0.023843 *
## REGION8 3.223e-01 1.122e-01 2.873 0.004125 **
## REGION9 2.906e-01 1.136e-01 2.558 0.010612 *
## LOCALE12 -1.204e-02 1.050e-02 -1.146 0.251914
## LOCALE13 -2.131e-02 9.977e-03 -2.136 0.032841 *
## LOCALE21 -2.796e-02 9.002e-03 -3.106 0.001932 **
## LOCALE22 -3.363e-02 1.660e-02 -2.026 0.042950 *
## LOCALE23 -1.931e-02 2.005e-02 -0.963 0.335651
## LOCALE31 -5.454e-02 1.601e-02 -3.407 0.000674 ***
## LOCALE32 -5.453e-02 1.102e-02 -4.947 8.37e-07 ***
## LOCALE33 -5.107e-02 1.343e-02 -3.801 0.000150 ***
## LOCALE41 -4.163e-02 1.796e-02 -2.318 0.020566 *
## LOCALE42 -7.918e-02 2.495e-02 -3.173 0.001537 **
## LOCALE43 -3.352e-02 3.075e-02 -1.090 0.275851
## COMMUTERresidential -4.587e-02 7.628e-03 -6.014 2.26e-09 ***
## RELIGIOUSreligious 7.401e-03 7.972e-03 0.928 0.353395
## TRANSFER_INlow -1.579e-02 6.239e-03 -2.530 0.011494 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1111 on 1532 degrees of freedom
## (553 observations deleted due to missingness)
## Multiple R-squared: 0.1186, Adjusted R-squared: 0.103
## F-statistic: 7.633 on 27 and 1532 DF, p-value: < 2.2e-16
programatic_lm <- lm(FEMALE~.-INSTNM, data = programs)
summary(programatic_lm)
##
## Call:
## lm(formula = FEMALE ~ . - INSTNM, data = programs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.51814 -0.04726 0.00479 0.05226 0.42812
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.506038 0.009055 55.888 < 2e-16 ***
## PCIP05 0.738557 0.309036 2.390 0.01695 *
## PCIP11 -0.132532 0.032839 -4.036 5.65e-05 ***
## PCIP14 -0.345290 0.027288 -12.654 < 2e-16 ***
## PCIP15 -0.388494 0.092301 -4.209 2.68e-05 ***
## PCIP16 0.134374 0.203992 0.659 0.51015
## PCIP23 0.513068 0.129438 3.964 7.64e-05 ***
## PCIP24 0.095072 0.022843 4.162 3.29e-05 ***
## PCIP26 0.024745 0.039982 0.619 0.53606
## PCIP27 -0.346416 0.220971 -1.568 0.11712
## PCIP30 0.191170 0.037700 5.071 4.34e-07 ***
## PCIP38 -0.131965 0.050277 -2.625 0.00874 **
## PCIP40 0.029291 0.149801 0.196 0.84499
## PCIP42 0.200817 0.028806 6.971 4.29e-12 ***
## PCIP43 0.078830 0.035220 2.238 0.02532 *
## PCIP45 0.101860 0.044925 2.267 0.02348 *
## PCIP50 0.111406 0.016224 6.867 8.82e-12 ***
## PCIP51 0.314181 0.012934 24.290 < 2e-16 ***
## PCIP52 0.133392 0.015839 8.422 < 2e-16 ***
## PCIP54 -0.919698 0.206975 -4.444 9.35e-06 ***
## PREDDEG NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09472 on 1933 degrees of freedom
## (160 observations deleted due to missingness)
## Multiple R-squared: 0.4182, Adjusted R-squared: 0.4124
## F-statistic: 73.12 on 19 and 1933 DF, p-value: < 2.2e-16
student_body_lm <- lm(FEMALE~.-INSTNM, data = demographics)
summary(student_body_lm)
##
## Call:
## lm(formula = FEMALE ~ . - INSTNM, data = demographics)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46126 -0.04759 0.00420 0.05736 0.42579
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.003e-01 2.004e-01 -1.998 0.045978 *
## PPTUG_EF 1.130e-01 3.718e-02 3.038 0.002433 **
## SAT_AVG -7.701e-05 3.083e-05 -2.498 0.012620 *
## AGE_ENTRY 1.402e-02 1.998e-03 7.015 3.85e-12 ***
## FIRST_GEN 1.787e-02 5.657e-02 0.316 0.752138
## NON_WHITE 8.432e-02 1.671e-02 5.046 5.20e-07 ***
## logFAMINC 6.294e-02 1.668e-02 3.774 0.000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09943 on 1190 degrees of freedom
## (916 observations deleted due to missingness)
## Multiple R-squared: 0.1426, Adjusted R-squared: 0.1382
## F-statistic: 32.98 on 6 and 1190 DF, p-value: < 2.2e-16