Introduction

Alumni donations serve as a vital revenue stream for colleges and universities. Analyzing the impact of various factors on donation rates can help predict a university’s revenue for a given year and inform strategies to increase alumni contributions. Identifying the key drivers behind alumni giving can guide administrators in implementing policies to boost donations and, consequently, overall revenue.

For instance, studies suggest that students who report higher satisfaction with their interactions with faculty are more likely to graduate and contribute financially to their alma mater. Other influential factors may include whether the institution is public or private, its national ranking, or even its location. Ivy League schools, for example, consistently report higher alumni donation rates, highlighting the potential influence of prestige and regional demographics.

This report investigates the effects of these variables on alumni donation rates. Through exploratory data analysis and the development of a linear regression model, the report quantifies these relationships. Model diagnostics are employed to identify and address potential flaws, ensuring the robustness of the findings.

Packages Used

library(knitr)
library(tidyverse)
library(ggpubr)
library(broom)
library(DT)
library(car)
library(ggplot2)

opts_chunk$set(message = FALSE, warning = FALSE, echo=FALSE)

This report analyzes donation data from 48 national universities, sourced from America’s Best Colleges, Year 2000 Edition. The dataset has been enhanced by including additional information, such as the state where each school is located and its ranking from U.S. News & World Report.

Variable Descriptions

Variables Description
school School name
percent_of_classes_under_20 Percentage of classes with fewer than 20 students
student_faculty_ratio Student-to-faculty ratio
alumni_giving_rate Alumni donation percentage
private Private or public school designation
state State location of the school
ranking 2018 US News & World Report ranking

Univariate Analysis

The model’s response variable is the alumni_giving_rate. A summary of the predictor variables is provided below.

percent_of_classes_under_20

The percentage of classes with fewer than 20 students ranges from 29 to 77, with an average of 55.73 across all schools.

The univariate regression model for this variable produces the following coefficients. Given the low p-value, this variable is likely to have a significant influence on the alumni giving rate.

##                               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -7.3860676  6.5654723 -1.124986 2.664307e-01
## percent_of_classes_under_20  0.6577687  0.1147048  5.734448 7.228121e-07

For every unit increase in the percentage of classes with fewer than 20 students, the donation rate increases by 0.66 percentage points.

student_faculty_ratio

The student-faculty ratio ranges from 3, 2, 1 to 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. On average, there are 11.54 students per faculty member.

The results of the univariate model are presented below. This variable is expected to have a significant impact on the response variable.

##                        Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)           53.013827   3.421450 15.494548 7.058813e-20
## student_faculty_ratio -2.057155   0.273716 -7.515653 1.544232e-09

For every unit increase in the student-faculty ratio, the donation rate decreases by 2.06 percentage points.

private

The dataset includes 33 private schools and 15 public schools.

The graph below, along with the regression model coefficients, clearly indicates that a school’s private or public status significantly impacts the alumni donation rate.

##             Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 15.66667   2.540545 6.166655 1.628283e-07
## private1    19.78788   3.064013 6.458158 5.938532e-08

The summary above indicates that public schools receive an average donation rate of 15.67 percentage points. Private schools, however, receive an additional 19.79 percentage points on top of that.

state

The dataset includes schools from 25 states. While the first graph highlights significant variation in average donation rates across states, the second graph reveals that the sample size for each state is too small to draw reliable conclusions. Therefore, it would not be advisable to include this variable in the final model.

ranking

There is a clear linear relationship between university rankings and alumni donations, with higher-ranked schools being more likely to receive donations from their alumni. This relationship is further supported by the coefficients of the univariate model presented below.

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 43.1998639 2.90195144 14.886488 3.298133e-19
## ranking     -0.5422494 0.09647412 -5.620672 1.068157e-06

For every unit improvement in rank (a decrease in rank value), the alumni donation rate increases by an average of 0.54 percentage points.

Modelling

Basic Model

Building a model incorporating all variables identified as significant influencers of the alumni donation rate.

## 
## Call:
## lm(formula = alumni_giving_rate ~ student_faculty_ratio + percent_of_classes_under_20 + 
##     private + ranking, data = alumni)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7557  -5.5338  -0.5836   4.7865  22.3672 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  51.8087    14.5213   3.568   0.0009 ***
## student_faculty_ratio        -1.2424     0.4911  -2.530   0.0152 *  
## percent_of_classes_under_20  -0.1100     0.1881  -0.585   0.5618    
## private1                      6.7788     5.1073   1.327   0.1914    
## ranking                      -0.2619     0.1119  -2.341   0.0240 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.632 on 43 degrees of freedom
## Multiple R-squared:  0.6227, Adjusted R-squared:  0.5876 
## F-statistic: 17.74 on 4 and 43 DF,  p-value: 1.138e-08

Based on the p-values of the coefficients, the variables percent_of_classes_under_20 and private do not appear to significantly influence the model, despite showing significance during the univariate analyses. This discrepancy is likely due to multicollinearity between these variables. The difference between the \(R^2\) and \(\text{Adjusted } R^2\) further supports this explanation. Additionally, the mean squared error (MSE) of the model is calculated as 69.65.

##       student_faculty_ratio percent_of_classes_under_20 
##                    3.580330                    3.886828 
##                     private                     ranking 
##                    3.610477                    1.976081

However, an examination of the Variance Inflation Factor (VIF) reveals no values exceeding 10, indicating that multicollinearity is not a significant concern. As a result, these two variables are excluded from the model for now, but they will be revisited for further analysis later.

Variable Selection

With the two variables removed, the summary of the new model is

## 
## Call:
## lm(formula = alumni_giving_rate ~ student_faculty_ratio + ranking, 
##     data = alumni)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9086  -6.1546   0.2597   4.4597  21.2957 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           53.74558    3.24880  16.543  < 2e-16 ***
## student_faculty_ratio -1.55767    0.32534  -4.788 1.86e-05 ***
## ranking               -0.25291    0.09978  -2.535   0.0148 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.609 on 45 degrees of freedom
## Multiple R-squared:  0.6072, Adjusted R-squared:  0.5898 
## F-statistic: 34.79 on 2 and 45 DF,  p-value: 7.38e-10

After retaining only the significant variables in the model, there is no observed improvement in the \(\text{Adjusted } R^2\), which remains at 0.59, or in the \(MSE\), calculated as 72.5.

Residual Diagnostics

Checking the fit of the model with the following residual analysis plots.

Applying a logarithmic transformation to the ranking variable can help address the exponential relationship observed.

Log Transformation

The new model after the transformation.

## 
## Call:
## lm(formula = alumni_giving_rate ~ student_faculty_ratio + ranking2, 
##     data = alumni)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.3976  -5.7894   0.6323   4.2420  21.5566 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            61.1314     3.9543  15.459  < 2e-16 ***
## student_faculty_ratio  -1.4228     0.3136  -4.538 4.21e-05 ***
## ranking2               -5.2561     1.5878  -3.310  0.00184 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.253 on 45 degrees of freedom
## Multiple R-squared:  0.639,  Adjusted R-squared:  0.623 
## F-statistic: 39.83 on 2 and 45 DF,  p-value: 1.103e-10

It is observed that the \(\text{Adjusted } R^2\) increases to 0.62 and the \(MSE\) decreases to 66.63.

Residual Diagnostics

Box-Cox Transformation

The model can be further refined by applying a Box-Cox transformation to the response variable. The updated model is presented below.

## 
## Call:
## lm(formula = alumni_giving_rate2 ~ student_faculty_ratio + ranking2, 
##     data = alumni)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5503 -1.1945  0.1262  0.8061  3.9072 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           14.44137    0.75593  19.104  < 2e-16 ***
## student_faculty_ratio -0.31443    0.05994  -5.246 4.04e-06 ***
## ranking2              -0.78227    0.30354  -2.577   0.0133 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.578 on 45 degrees of freedom
## Multiple R-squared:  0.6426, Adjusted R-squared:  0.6267 
## F-statistic: 40.45 on 2 and 45 DF,  p-value: 8.83e-11

The \(adjusted \: R^2\) improves further to 0.63, while the \(MSE\) decreases significantly to 2.43.

Residual Diagnostics

Applying the same residual diagnostics on the new model.

Variable Selection 2

Insert variable private

Going back to the variables that were ignored, private is fit into the final model and found to improve.

## 
## Call:
## lm(formula = alumni_giving_rate2 ~ student_faculty_ratio + ranking2 + 
##     private, data = alumni)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2235 -1.2151  0.0255  0.8514  3.1533 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           11.68523    1.47088   7.944 4.85e-10 ***
## student_faculty_ratio -0.17379    0.08707  -1.996  0.05214 .  
## ranking2              -0.80677    0.29216  -2.761  0.00836 ** 
## private1               1.75265    0.81309   2.156  0.03663 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.517 on 44 degrees of freedom
## Multiple R-squared:  0.6767, Adjusted R-squared:  0.6547 
## F-statistic:  30.7 on 3 and 44 DF,  p-value: 7.259e-11

The \(adjusted \: R^2\) increases again to 0.65 and \(MSE\) reduces further to 2.2.

But the significance of the student_faculty_ratio variable dips. The increase in \(adjusted \: R^2\) and \(MSE\) is not high enough to justify accepting the new model.

Insert variable percent_of_classes_under_20

However, this variable exhibits the opposite effect and shows a decreasing trend \(adjusted \: R^2\) and \(MSE\).

## 
## Call:
## lm(formula = alumni_giving_rate2 ~ student_faculty_ratio + ranking2 + 
##     percent_of_classes_under_20, data = alumni)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5751 -1.1999  0.1094  0.8352  3.9101 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 13.739852   2.666998   5.152 5.83e-06 ***
## student_faculty_ratio       -0.300137   0.079863  -3.758 0.000501 ***
## ranking2                    -0.755347   0.321999  -2.346 0.023557 *  
## percent_of_classes_under_20  0.008209   0.029902   0.275 0.784957    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.594 on 44 degrees of freedom
## Multiple R-squared:  0.6432, Adjusted R-squared:  0.6189 
## F-statistic: 26.44 on 3 and 44 DF,  p-value: 6.213e-10

Summary

The final model indicates that the alumni_giving_rate is influenced by two key variables: ranking and student_faculty_ratio. The regression equation is as follows:

\[alumniGivingRate^.505 - 1.98 = 14.636 - (0.319*studentFacultyRatio) - (0.798*log(ranking))\]

The model achieves an \(adjusted R^2\) value of 62.67%, indicating that 62.67% of the variability in the alumni_giving_rate is explained by the predictors. Additionally, the model demonstrates a low mean-squared error of 2.43, highlighting its accuracy.

The negative coefficients for both predictor variables suggest an inverse relationship, which aligns with the negative linear patterns observed during the exploratory analysis. This indicates that as the student_faculty_ratio increases or the ranking worsens (higher rank number), the alumni_giving_rate decreases.