Set Up

Clear Work Space

## EMPTY FUNCTIONS AND VAR IN THE ENV TAB/WINDOW #

# Clear the workspace
  rm(list = ls()) # Clear environment
  gc()            # Clear unused memory
##          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 530214 28.4    1180197 63.1         NA   669400 35.8
## Vcells 983310  7.6    8388608 64.0      16384  1851654 14.2
  cat("\f")       # Clear the console

Install Packages

## LOAD PACKAGES ##

# Prepare needed libraries
packages <- c("psych",       # quick summary stats for data exploration,
              "stargazer",   # summary stats for sharing,
              "tidyverse",   # data manipulation like selecting variables,
              "corrplot",    # correlation plots
              "ggplot2",     # graphing
              "data.table",  # reshape for graphing 
              "car",         # vif
              "reshape2",    # melt
              "readxl",      # read and open xl files
              "visdat"
              )

  for (i in 1:length(packages)) {
    if (!packages[i] %in% rownames(installed.packages())) {
      install.packages(packages[i]
                       , repos = "http://cran.rstudio.com/"
                       , dependencies = TRUE
                       )
    }
    library(packages[i], character.only = TRUE)
  }
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.2     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.2     âś” tibble    3.2.1
## âś” lubridate 1.9.2     âś” tidyr     1.3.0
## âś” purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– ggplot2::%+%()   masks psych::%+%()
## âś– ggplot2::alpha() masks psych::alpha()
## âś– dplyr::filter()  masks stats::filter()
## âś– dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## corrplot 0.92 loaded
## 
## 
## Attaching package: 'data.table'
## 
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## 
## 
## The following object is masked from 'package:purrr':
## 
##     transpose
## 
## 
## Loading required package: carData
## 
## 
## Attaching package: 'car'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## 
## The following object is masked from 'package:purrr':
## 
##     some
## 
## 
## The following object is masked from 'package:psych':
## 
##     logit
## 
## 
## 
## Attaching package: 'reshape2'
## 
## 
## The following objects are masked from 'package:data.table':
## 
##     dcast, melt
## 
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
# dplyr package preferred
conflicted::conflict_prefer(name = "select", 
                            winner = "dplyr")
## [conflicted] Will prefer dplyr::select over any other package.
rm(packages)

Import Data

# IMPORT DATA #

# Load required package
library(readxl)

# Read excel file
Data_Gender_composition_at_ranked_universities <- read_excel("Downloads/Take-home assignment #1-5/Data - Gender composition at ranked universities.xlsx")
# View(Data_Gender_composition_at_ranked_universities)

# Store file in df called df_data
df_data <- as.data.frame(Data_Gender_composition_at_ranked_universities)
# 98 obs of 27var

Introduction

The following paper seeks to measure and understand the causal effect of the percentage of female Economics freshmen on the academic ranking of an institution. By examining this relationship, we can further determine whether there is a significant connection between the representation of women in Economics programs and the academic ranking of a College or University. Additionally, we can uncover insights into potential barriers that may affect women's success in the field of Economics.

Methodology

Data

The data used in the following analysis comes from the Integrated Postsecondary Education Data System (IPEDS) which is provided by the National Center for Education Statistics (NCES). The cross-sectional data set describes the gender composition of students and faculty across several top-ranked medium to large Universities within the United States for the year 2015.

The raw data set only had less than 0.1% of its values missing, which I replaced with their corresponding medians.

Sample

To ensure an accurate analysis of the relationship between the percentage of female Economics freshmen and the academic ranking of an institution, I enhanced the data set by including four additional variables that aimed to provide a more comprehensive and meaningful assessment of gender representation within the Economics program of the institutions. To achieve this, I calculated the following percentages:

  1. EconFacultyFemale_Percentage:This variable measures the percentage of female Economics faculty by dividing the number of female Economics faculty members by the total number of Economics faculty members (EconFaculty_Female / EconFaculty_Total). By including this variable, I aimed to account for the gender composition within the faculty which plays an important role on the overall dynamics of the Economics program.

  2. EconFreshmenFemale_Percentage: Similar to the previous variable, this measure calculates the percentage of female Economics freshmen by dividing the number of female Economics freshmen by the total number of Economics freshmen. Employing the same formula as above, this variable serves as the main independent variable in the regression analysis, allowing us to examine its impact on the academic ranking of each institution.

  3. FreshmenFemale_Percentage: In addition, I introduced this variable to capture the overall representation of female freshmen across all fields of study. By considering the percentage of female freshmen in relation to the total number of freshmen, we gain a broader perspective on gender diversity within the student body.

  4. FacultyFemale_Percentage: Similarly, this variable accounts for the proportion of female faculty members in all academic disciplines. It provides a broader view of gender representation across the entire faculty.

The inclusion of these percentages in the data set enables a more comprehensive analysis, considering both the size and composition of the student and faculty populations. This facilitates better and more meaningful comparisons between the different institutions, as it accounts for the relative representation of female students and faculty members within each university.

After the adjustments and the inclusion of the four percentage variables, the data set includes 98 observations of 31 variables. The following table comprises of a few summary statistics (mean, standard deviation, minimum and maximum) of the variables in the data frame.

# SUMMARY STATISTICS #

# Load required packages
require(stargazer)

# Create label for table
labels <- c(
  'Academic Ranking',
  'Faculty Total',
  'Faculty Female',
  'Faculty Male',
  'Econ Faculty Total',
  'Econ Faculty Female',
  'Econ Faculty Male',
  'Freshmen Total',
  'Freshmen Male',
  'Freshmen Female',
  'Freshmen Econ Total',
  'Freshmen Econ Male',
  'Freshmen Econ Female',
  'Econ Degrees Granted Total',
  'Econ Degrees Granted Male',
  'Econ Degrees Granted Female',
  'Stem Degrees Granted Total',
  'Stem Degrees Granted Male',
  'Stem Degrees Granted Female',
  'Student Faculty Ratio',
  'SATMath75 Freshmen Average',
  'Acceptance Rate (%)',
  'Student Yield Total (%)',
  'Student Yield Male (%)',
  'Student Yield Female (%)',
  'Prof Salary Average',
  'Econ Faculty Female %',
  'Econ Freshmen Female %',
  'Freshmen Female %',
  'Faculty Female %'
)

# summary statistic table
stargazer(df_data,
          type = "text",
          title = "Summary Statistics of Data",
          summary.stat = c("mean", "sd", "median", "min", "max"),
          covariate.labels = labels
          )
## 
## Summary Statistics of Data
## ===========================================================================
## Statistic                      Mean      St. Dev.   Median    Min     Max  
## ---------------------------------------------------------------------------
## Academic Ranking              47.806      27.827     48.5      1      98   
## Faculty Total                1,902.857  1,045.557    1,685    341    6,344 
## Faculty Female                747.082    426.217      655     155    2,457 
## Faculty Male                 1,155.776   629.306    1,039.5   178    3,887 
## Econ Faculty Total            33.439      14.237     30.5      9      99   
## Econ Faculty Female            7.235      3.481        7       1      19   
## Econ Faculty Male             26.204      12.307      22       6      82   
## Freshmen Total               4,407.378  2,339.158    4,337    957   10,187 
## Freshmen Male                2,109.857  1,149.279   1,969.5   385    5,245 
## Freshmen Female              2,297.520  1,221.374    2,192    486    5,190 
## Freshmen Econ Total           187.071    128.961      167      7      623  
## Freshmen Econ Male            127.082     83.303     115.5     4      326  
## Freshmen Econ Female          59.990      53.432      51       0      331  
## Econ Degrees Granted Total    157.918    109.744      138      10     544  
## Econ Degrees Granted Male     110.020     73.747      98       8      349  
## Econ Degrees Granted Female   47.898      41.788      36       2      252  
## Stem Degrees Granted Total    834.918    594.971     659.5     59    2,329 
## Stem Degrees Granted Male     519.867    389.799      409      20    1,491 
## Stem Degrees Granted Female   315.051    220.785      249      38    1,070 
## Student Faculty Ratio         15.245      5.026       16       5      27   
## SATMath75 Freshmen Average    696.643     58.580      690     570     800  
## Acceptance Rate (%)           48.378      23.658      51       6      93   
## Student Yield Total (%)       36.214      13.516      35       10     84   
## Student Yield Male (%)        37.398      13.486      36       12     83   
## Student Yield Female (%)      35.163      14.161      34       1      84   
## Prof Salary Average         137,104.200 25,784.600 131,251.5 85,824 202,464
## Econ Faculty Female %         29.972      9.658     30.553   0.000  56.537 
## Econ Freshmen Female %        52.166      4.301     52.239   40.237 68.493 
## Freshmen Female %             22.588      8.658     21.807   4.255  41.176 
## Faculty Female %              39.271      4.449     39.443   25.450 50.960 
## ---------------------------------------------------------------------------

The academic ranking of the universities in the dataset ranges from 1 to 98, where a lower number indicates a higher ranking. The variable representing the rankings appears to follow a normal distribution, as the mean and median are very close in value. The dataset includes universities of varying sizes, as indicated by the total faculty variable, which ranges from 341 to 6,344.

When examining the number of male Economics freshmen, the range is from 4 to 326 students, with a median value of approximately 116. In contrast, the number of female Economics freshmen ranges from 0 to 331 students, with a median value of 51. This indicates that, overall, there are more male freshmen enrolled in the Economics program of universities compared to female freshmen.

Similarly, the median number of Economics degrees granted to male students is nearly three times higher (98) than the median number granted to female students (36). This highlights a disparity in the number of degrees awarded and the composition between male and female students in Economics across universities.

Moreover, the percentage of female Economics faculty ranges from 4.3% to 41.2%, with a mean and median around 22%. These figures suggest that the representation of female faculty members in the Economics department is consistently lower than that of male faculty members.

Lastly, the percentage of female Economics freshmen ranges from 0% to 56.5%. This variable also follows a normal distribution, with the mean and median approximately at 30%. This indicates that, on average, around 30% of the Economics freshmen are female across the universities in the dataset.

In summary, the descriptive statistics illustrate the distribution and characteristics of various variables, including academic rankings, institution sizes, gender composition of freshmen and faculty within the institutions and within the Economics departments. These statistics provide an overview of the gender disparities and enrollment patterns within the Economics discipline.

Methods

A multivariate regression model was used to explore the relationship between the percentage of female freshmen within the Economics department of an institution on the academic ranking of a university in 2015, controlling for student and faculty composition demographics, and student's prior knowledge.

The equation for my model is:

\[ \text{Academic Ranking}_i = \beta_0 + \beta_1 \cdot \text{Percentage of Female Econ Freshmen}_i + \beta_2 \cdot \alpha_i + \beta_3 \cdot \gamma_i + \beta_4 \cdot \delta_i + \epsilon_i \]

where

  • Academic Ranking, the dependent variable, represents the academic ranking for the i(th) institution;

  • Percentage of Female Econ Freshmen, the main independent variable, represents the percentage of female freshmen in Economics for the i(th) institution;

  • \(\alpha\) represents student demographics in the i(th) institution. I included the percentage of female students and the percentage of female freshmen as controls for student demographics since the gender composition of the university as a whole may impact the gender representation in specific programs like economics;

  • \(\gamma\) represents faculty demographics in the i(th) institution. I included the percentage of female faculty and female faculty in Economics as controls since the gender composition of the faculty may also impact the gender representation in specific programs like economics;

  • \(\delta\) represents the controls for students’ knowledge prior to getting into the program. I included the variable representing mathematics SAT scores as a control since the SAT scores may be related with the gender composition of the freshmen class;

  • \(\beta_0\) represents the intercept/constant term

  • \(\beta_1\) represents the coefficient for the percentage of female economics freshmen variable, which quantifies the expected change in academic ranking for institutions depending on the percentage of female freshmen in economics.

Results

Four OLS models were conducted to measure this causal relationship, gradually including different controls.

In model 1, simple OLS is used to get a raw estimate of the effect of the percentage of female Economics freshmen on the academic ranking of an institution without controlling for any observable differences across the institutions. Based on the OLS model, a 1 percent increase in the percentage of female Economics freshmen, decreases the institution's academic ranking by 1.143 percentage points, holding all other variables constant. This is significant at the 1% level. However, the model has omitted variable bias (OVB) since there are variables that are related with y and correlated with x, which produce unreliable estimates of the relationships between the variables. Additionally, the relatively small R2 and large residual standard error for this model suggests that the model’s ability to explain the variations in the dependent variable is limited, and there may be unaccounted factors that are influencing the outcome.

In model 2, a multivariate regression is used to account for differences in the composition of students in institutions. Interestingly, when controlling for student composition differences, a 1 percent increase in the percentage of female Economics freshmen, decreases the institution's academic ranking by 1.254 percentage points, holding all variables constant. This is statistically significant at the 1% level. The R2 has increased a bit while the residual standard error has decreased a bit, meaning an improved model fit. However, there are still omitted variables that need to be controlled for.

In model 3, a multivariate regression is used to account for both differences in the composition of students and faculty in institutions. In adding these variables to the model, a 1 percent increase in the percentage of female Economics freshmen, decreases the institution's academic ranking by 1.090 percentage points, holding all variables constant. This is statistically significant at the 1% level. When adding the controls for the composition differences in the faculty, the coefficient got closer to 0. Additionally, the R2 is continuing to increase a bit while the residual standard error is decreasing slowly, which all together, is telling us the bias is slowly getting smaller and the model is becoming a better fit.

Lastly, in model 4, a multivariate regression is used to account for both differences in the composition of students and faculty in institutions, and students' prior knowledge that affects who is admitted to Economics. Based on this model, a 1 percent increase in the percentage of female Economics freshmen, decreases the institution's academic ranking by 0.578 percentage points, holding other variables constant. Although, this is statistically significant at the 5% level, the R2 and residual standard error have significantly improved, indicating an even better model fit. Additionally, the coefficient makes a lot more theoretical sense. Based on the descriptive statistics from the dataset, on average, less economics and stem degrees were granted to women than men, which could explain why an increase in the female composition of freshmen in economics decreases an institutions academic ranking.

## 
## Summary Statistics Table of Models
## ====================================================================================================================
##                                                              Dependent variable:                                    
##                          -------------------------------------------------------------------------------------------
##                                                               Academic_ranking                                      
##                               No Controls       +Student Demographics   +Faculty Demographic  +Intelligence Controls
##                                   (1)                    (2)                    (3)                    (4)          
## --------------------------------------------------------------------------------------------------------------------
## Freshmen Econ Female %         -1.143***              -1.254***              -1.090***               -0.578**       
##                                 (0.270)                (0.251)                (0.259)                (0.223)        
##                                                                                                                     
## Student Yield Female                                  -0.732***              -0.642***               -0.278*        
##                                                        (0.169)                (0.170)                (0.148)        
##                                                                                                                     
## Freshmen Female %                                       1.090*                 0.032                  -0.112        
##                                                        (0.580)                (0.789)                (0.642)        
##                                                                                                                     
## Econ Faculty Female %                                                          0.408                  0.370*        
##                                                                               (0.271)                (0.220)        
##                                                                                                                     
## Faculty Female %                                                               1.248*                 0.285         
##                                                                               (0.746)                (0.623)        
##                                                                                                                     
## SAT Math Freshmen Scores                                                                            -0.272***       
##                                                                                                      (0.039)        
##                                                                                                                     
## Constant                       82.053***               54.294*                 43.106               250.843***      
##                                 (8.497)                (30.896)               (30.757)               (39.089)       
##                                                                                                                     
## --------------------------------------------------------------------------------------------------------------------
## Observations                       98                     98                     98                     98          
## R2                               0.157                  0.352                  0.388                  0.599         
## Adjusted R2                      0.148                  0.332                  0.355                  0.573         
## Residual Std. Error         25.678 (df = 96)       22.750 (df = 94)       22.349 (df = 92)       18.190 (df = 91)   
## F Statistic              17.914*** (df = 1; 96) 17.041*** (df = 3; 94) 11.675*** (df = 5; 92) 22.666*** (df = 6; 91)
## ====================================================================================================================
## Note:                                                                                    *p<0.1; **p<0.05; ***p<0.01

Finally, to ensure that there was low multicollinearity in my final model, I conducted a multicollinearity test on the variables in model 4. From the table below, the Variance Inflation Factor (VIF), a measure of multicollinearity, for each variable is below 5, which indicated low levels of multicollinearity.

Conclusion

Overall, we find that an increase in the percentage of female Economics freshmen negatively affects the academic ranking of an institution using a multivariate OLS regression.