Introduction
Alumni donations serve as a vital revenue stream for colleges and universities. Analyzing the impact of various factors on donation rates can help predict a university’s revenue for a given year and inform strategies to increase alumni contributions. Identifying the key drivers behind alumni giving can guide administrators in implementing policies to boost donations and, consequently, overall revenue.
For instance, studies suggest that students who report higher satisfaction with their interactions with faculty are more likely to graduate and contribute financially to their alma mater. Other influential factors may include whether the institution is public or private, its national ranking, or even its location. Ivy League schools, for example, consistently report higher alumni donation rates, highlighting the potential influence of prestige and regional demographics.
This report investigates the effects of these variables on alumni donation rates. Through exploratory data analysis and the development of a linear regression model, the report quantifies these relationships. Model diagnostics are employed to identify and address potential flaws, ensuring the robustness of the findings.
Packages Used
library(knitr)
library(tidyverse)
library(ggpubr)
library(broom)
library(DT)
library(car)
library(ggplot2)
opts_chunk$set(message = FALSE, warning = FALSE, echo=FALSE)
This report analyzes donation data from 48 national universities, sourced from America’s Best Colleges, Year 2024 Edition. The dataset has been enhanced by including additional information, such as the state where each school is located and its ranking from U.S. News & World Report.
Variable Descriptions
Variables | Description |
---|---|
school | School name |
percent_of_classes_under_20 | Percentage of classes with fewer than 20 students |
student_faculty_ratio | Student-to-faculty ratio |
alumni_giving_rate | Alumni donation percentage |
private | Private or public school designation |
state | State location of the school |
ranking | 2018 US News & World Report ranking |
Univariate Analysis
The model’s response variable is the alumni_giving_rate. A summary of the predictor variables is provided below.
percent_of_classes_under_20
The percentage of classes with fewer than 20 students ranges from 29 to 77, with an average of 55.73 across all schools.
The univariate regression model for this variable produces the following coefficients. Given the low p-value, this variable is likely to have a significant influence on the alumni giving rate.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.3860676 6.5654723 -1.124986 2.664307e-01
## percent_of_classes_under_20 0.6577687 0.1147048 5.734448 7.228121e-07
For every unit increase in the percentage of classes with fewer than 20 students, the donation rate increases by 0.66 percentage points.
student_faculty_ratio
The student-faculty ratio ranges from 3, 2, 1 to 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. On average, there are 11.54 students per faculty member.
The results of the univariate model are presented below. This variable is expected to have a significant impact on the response variable.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.013827 3.421450 15.494548 7.058813e-20
## student_faculty_ratio -2.057155 0.273716 -7.515653 1.544232e-09
For every unit increase in the student-faculty ratio, the donation rate decreases by 2.06 percentage points.
private
The dataset includes 33 private schools and 15 public schools.
The graph below, along with the regression model coefficients, clearly indicates that a school’s private or public status significantly impacts the alumni donation rate.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.66667 2.540545 6.166655 1.628283e-07
## private1 19.78788 3.064013 6.458158 5.938532e-08
The summary above indicates that public schools receive an average donation rate of 15.67 percentage points. Private schools, however, receive an additional 19.79 percentage points on top of that.
state
The dataset includes schools from 25 states. While the first graph highlights significant variation in average donation rates across states, the second graph reveals that the sample size for each state is too small to draw reliable conclusions. Therefore, it would not be advisable to include this variable in the final model.
ranking
There is a clear linear relationship between university rankings and alumni donations, with higher-ranked schools being more likely to receive donations from their alumni. This relationship is further supported by the coefficients of the univariate model presented below.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.9522031 3.1247117 12.785885 9.478404e-17
## ranking -0.4290425 0.1057517 -4.057075 1.905152e-04
For every unit improvement in rank (a decrease in rank value), the alumni donation rate increases by an average of 0.43 percentage points.
Modelling
Basic Model
Building a model incorporating all variables identified as significant influencers of the alumni donation rate.
##
## Call:
## lm(formula = alumni_giving_rate ~ student_faculty_ratio + percent_of_classes_under_20 +
## private + ranking, data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.5141 -6.1273 0.1021 5.4616 22.5710
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.02797 13.92134 3.594 0.000834 ***
## student_faculty_ratio -1.18219 0.48946 -2.415 0.020047 *
## percent_of_classes_under_20 -0.14725 0.19054 -0.773 0.443871
## private1 10.57281 5.33046 1.983 0.053724 .
## ranking -0.24806 0.09795 -2.532 0.015061 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.55 on 43 degrees of freedom
## Multiple R-squared: 0.6299, Adjusted R-squared: 0.5954
## F-statistic: 18.29 on 4 and 43 DF, p-value: 7.631e-09
Based on the p-values of the coefficients, the variables
percent_of_classes_under_20
and private
do not
appear to significantly influence the model, despite showing
significance during the univariate analyses. This discrepancy is likely
due to multicollinearity between these variables. The difference between
the \(R^2\) and \(\text{Adjusted } R^2\) further supports
this explanation. Additionally, the mean squared error (MSE) of the
model is calculated as 68.33.
## student_faculty_ratio percent_of_classes_under_20
## 3.624713 4.063576
## private ranking
## 4.008727 1.595735
However, an examination of the Variance Inflation Factor (VIF) reveals no values exceeding 10, indicating that multicollinearity is not a significant concern. As a result, these two variables are excluded from the model for now, but they will be revisited for further analysis later.
Variable Selection
With the two variables removed, the summary of the new model is
##
## Call:
## lm(formula = alumni_giving_rate ~ student_faculty_ratio + ranking,
## data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.686 -6.868 -0.663 4.931 21.453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.61430 3.36264 16.242 < 2e-16 ***
## student_faculty_ratio -1.77432 0.29205 -6.075 2.41e-07 ***
## ranking -0.19541 0.08809 -2.218 0.0316 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.738 on 45 degrees of freedom
## Multiple R-squared: 0.5954, Adjusted R-squared: 0.5774
## F-statistic: 33.11 on 2 and 45 DF, p-value: 1.439e-09
After retaining only the significant variables in the model, there is no observed improvement in the \(\text{Adjusted } R^2\), which remains at 0.58, or in the \(MSE\), calculated as 74.69.
Residual Diagnostics
Checking the fit of the model with the following residual analysis plots.
Applying a logarithmic transformation to the ranking
variable can help address the exponential relationship observed.
Log Transformation
The new model after the transformation.
##
## Call:
## lm(formula = alumni_giving_rate ~ student_faculty_ratio + ranking2,
## data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9234 -7.0609 -0.5318 4.5194 21.8762
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.8072 4.1053 15.056 < 2e-16 ***
## student_faculty_ratio -1.5438 0.2938 -5.254 3.93e-06 ***
## ranking2 -5.0562 1.5422 -3.278 0.00202 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.268 on 45 degrees of freedom
## Multiple R-squared: 0.6377, Adjusted R-squared: 0.6216
## F-statistic: 39.6 on 2 and 45 DF, p-value: 1.2e-10
It is observed that the \(\text{Adjusted } R^2\) increases to 0.62 and the \(MSE\) decreases to 66.88.
Residual Diagnostics
Box-Cox Transformation
The model can be further refined by applying a Box-Cox transformation to the response variable. The updated model is presented below.
##
## Call:
## lm(formula = alumni_giving_rate2 ~ student_faculty_ratio + ranking2,
## data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7454 -1.2728 -0.0606 0.8503 3.9552
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.53793 0.78455 18.530 < 2e-16 ***
## student_faculty_ratio -0.33267 0.05616 -5.924 4.05e-07 ***
## ranking2 -0.75022 0.29474 -2.545 0.0144 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.58 on 45 degrees of freedom
## Multiple R-squared: 0.6415, Adjusted R-squared: 0.6255
## F-statistic: 40.26 on 2 and 45 DF, p-value: 9.481e-11
The \(adjusted \: R^2\) improves further to 0.63, while the \(MSE\) decreases significantly to 2.44.
Residual Diagnostics
Applying the same residual diagnostics on the new model.
Variable Selection 2
Insert variable private
Going back to the variables that were ignored, private
is fit into the final model and found to improve.
##
## Call:
## lm(formula = alumni_giving_rate2 ~ student_faculty_ratio + ranking2 +
## private, data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2797 -1.0513 -0.0611 0.8386 3.0732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.49035 1.42672 8.054 3.38e-10 ***
## student_faculty_ratio -0.16314 0.08612 -1.894 0.06478 .
## ranking2 -0.85361 0.28194 -3.028 0.00411 **
## private1 2.02452 0.80936 2.501 0.01617 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.495 on 44 degrees of freedom
## Multiple R-squared: 0.6861, Adjusted R-squared: 0.6647
## F-statistic: 32.06 on 3 and 44 DF, p-value: 3.825e-11
The \(adjusted \: R^2\) increases
again to 0.66 and \(MSE\) reduces
further to 2.14.
But the significance of the
student_faculty_ratio
variable dips. The increase in \(adjusted \: R^2\) and \(MSE\) is not high enough to justify
accepting the new model.
Insert variable percent_of_classes_under_20
However, this variable exhibits the opposite effect and shows a decreasing trend \(adjusted \: R^2\) and \(MSE\).
##
## Call:
## lm(formula = alumni_giving_rate2 ~ student_faculty_ratio + ranking2 +
## percent_of_classes_under_20, data = alumni)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6191 -1.2798 -0.0826 0.8720 3.9565
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.813207 2.693147 5.129 6.29e-06 ***
## student_faculty_ratio -0.317378 0.078545 -4.041 0.000211 ***
## ranking2 -0.723357 0.312705 -2.313 0.025445 *
## percent_of_classes_under_20 0.008434 0.029953 0.282 0.779590
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.597 on 44 degrees of freedom
## Multiple R-squared: 0.6421, Adjusted R-squared: 0.6177
## F-statistic: 26.31 on 3 and 44 DF, p-value: 6.642e-10
Summary
The final model indicates that the alumni_giving_rate is influenced
by two key variables: ranking
and
student_faculty_ratio
. The regression equation is as
follows:
\[alumniGivingRate^.505 - 1.98 = 14.636 - (0.319*studentFacultyRatio) - (0.798*log(ranking))\]
The model achieves an \(adjusted \: R^2\) explains 62.55, indicating that \(adjusted \: R^2\) explains 62.55 of the variability in the alumni_giving_rate is explained by the predictors. Additionally, the model demonstrates a low mean-squared error of 2.44, highlighting its accuracy.
The negative coefficients for both predictor variables suggest an inverse relationship, which aligns with the negative linear patterns observed during the exploratory analysis. This indicates that as the student_faculty_ratio increases or the ranking worsens (higher rank number), the alumni_giving_rate decreases.