Final project data 110

Author

Kenneth Nguyen

Source: https://coredo.eu/dictionary/payment-institution/

Introduction

This project explores AmeriCorps Segal Education Award payments made to colleges and universities across the United States and international campuses. The dataset used in this analysis is titled “Segal AmeriCorps Education Award Detailed Payments by Institution 2020” and can be founded in https://catalog.data.gov/dataset. The data was collected and published by AmeriCorps, a United States federal agency focused on national service and volunteerism. And there is no readme for the file. The dataset contains information about institutions that received AmeriCorps education award payments, including payment amounts, institution names, states, campuses, award types, and payment categories. I selected this dataset because education affordability and financial assistance programs are important topics for students. Since many college students rely on financial aid and educational support, I wanted to better understand how AmeriCorps education award money is distributed across institutions and states.

Research question: 1.How do AmeriCorps education award payments vary across states and institutions? 2.Can variables such as award year, payment type, and country category predict total payment amounts?

The dataset includes the following important variables:

INSTITUTION – Name of the college or university (categorical) SYSTEM – University system associated with the institution (categorical) STATE – State abbreviation where payments were sent (categorical) COUNTRY_CATEGORY – Domestic or international category (categorical) YEAR – Year the award was earned (quantitative) EARNED_AWARD_TYPE – Type of award earned (categorical) PAYMENT_TYPE – Type of payment made (categorical) TOTAL_PAYMENTS – Dollar amount paid to the institution (quantitative)

Background info

The AmeriCorps program is a national service initiative in the United States that promotes community involvement while providing educational and financial support to participants. One of its most important benefits is the Segal AmeriCorps Education Award, which members earn after successfully completing a term of service. The award can be used to pay for higher education expenses or to repay qualified student loans. The value of a full-time award is connected to the maximum Federal Pell Grant amount for the year in which the service is completed, causing award values to change over time (AmeriCorps). AmeriCorps reports that members have earned billions of dollars in educational awards since the program began. These awards help reduce financial burdens for participants while also directing substantial payments toward colleges and universities where members choose to use their awards (AmeriCorps).

Additionally, the Segal Education Award may be used in several different ways, including tuition payments and qualified student loan repayment. Payment amounts are not always identical because they can vary according to service year, payment type, and institutional factors. Studying these payment patterns may help identify whether characteristics such as award year, payment type, and country category contribute to predicting payment amounts across institutions.

library(tidyverse) # Load all the libraries that I will need or might use
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(highcharter)
setwd("C:/Users/kenne/Downloads") # Set the working directory 

americorps<- read.csv("Segal_AmeriCorps_Education_Award__Detailed_Payments_by_Institution_2020.csv") # Put the dataset into a variable for coding

EDA

americorps_clean <- americorps |> #cleaning the dataset, removing all of the nas of the variables I will be using

filter(
!is.na(STATE),
!is.na(INSTITUTION),
!is.na(TOTAL_PAYMENTS),
!is.na(YEAR),
!is.na(PAYMENT_TYPE),
!is.na(COUNTRY_CATEGORY)
)
americorps_clean <- americorps_clean |> # Mutating the variable so I won't run into any issues in later coding.
mutate(STATE = factor(STATE),
INSTITUTION = factor(INSTITUTION),
PAYMENT_TYPE = factor(PAYMENT_TYPE),
COUNTRY_CATEGORY = factor(COUNTRY_CATEGORY),
YEAR = as.numeric(YEAR) # Turn into a number for coding and the graph
)
summary(americorps_clean) # Gives me the details of the variables of how many there are and what type pf class and mode they are.
                      INSTITUTION        SYSTEM             CAMPUS         
 UNIVERSITY OF WISCONSIN    :   935   Length:114187      Length:114187     
 UNIVERSITY OF CALIFORNIA   :   930   Class :character   Class :character  
 CALIFORNIA STATE UNIVERSITY:   680   Mode  :character   Mode  :character  
 UNIVERSITY OF TEXAS        :   422                                        
 NAVIENT                    :   394                                        
 HARVARD UNIVERSITY         :   345                                        
 (Other)                    :110481                                        
   COLLEGE              STATE            COUNTRY_CATEGORY       YEAR     
 Length:114187      CA     :10226   DOMESTIC     :113136   Min.   :1995  
 Class :character   NY     : 7996   INTERNATIONAL:  1051   1st Qu.:2002  
 Mode  :character   PA     : 5893                          Median :2008  
                    TX     : 5666                          Mean   :2008  
                    MA     : 4490                          3rd Qu.:2014  
                    IL     : 4374                          Max.   :2020  
                    (Other):75542                                        
 EARNED_AWARD_TYPE                 PAYMENT_TYPE   TOTAL_PAYMENTS    
 Length:114187      Education Expenses   :75896   Min.   : -186487  
 Class :character   Repay Loans          :24461   1st Qu.:     906  
 Mode  :character   Returns              :12002   Median :    3676  
                    Uncategorized Payment: 1828   Mean   :   27616  
                                                  3rd Qu.:   11813  
                                                  Max.   :26961959  
                                                                    

Research Question 1

How do AmeriCorps payments vary across states and institutions?

state_payment <- americorps_clean |> # Looking for payments from state
group_by(STATE) |>
summarise(
Total_Payments = sum(TOTAL_PAYMENTS),
Average_Payment = mean(TOTAL_PAYMENTS),
Count = n(),
groups = "drop"
) |>
arrange(desc(Total_Payments)) |> # The highest number first and then descending 
slice(1:10) # only want top 10 states in terms of payment

state_payment
# A tibble: 10 × 5
   STATE Total_Payments Average_Payment Count groups
   <fct>          <dbl>           <dbl> <int> <chr> 
 1 PA        423676144.          71895.  5893 drop  
 2 NY        397808433.          49751.  7996 drop  
 3 GA        387688074.         133180.  2911 drop  
 4 WI        245423791.          94831.  2588 drop  
 5 CA        231279433.          22617. 10226 drop  
 6 TX        100753633.          17782.  5666 drop  
 7 MA         85114889.          18957.  4490 drop  
 8 IL         78102887.          17856.  4374 drop  
 9 WA         74958219.          23308.  3216 drop  
10 MD         69474647.          32239.  2155 drop  
institution_payment <- americorps_clean |> # Similar to the code on top, but institution instead of states
group_by(INSTITUTION) |>
summarise(
Total_Payments = sum(TOTAL_PAYMENTS),
groups = "drop"
) |>
arrange(desc(Total_Payments)) |>
slice(1:10)

institution_payment
# A tibble: 10 × 3
   INSTITUTION                                             Total_Payments groups
   <fct>                                                            <dbl> <chr> 
 1 UNITED STATES DEPARTMENT OF EDUCATION                       250293353. drop  
 2 NAVIENT                                                     210641458. drop  
 3 GREAT LAKES HIGHER EDUCATION CORPORATION AND AFFILIATES     205718331. drop  
 4 FEDLOAN SERVICING                                           160648767. drop  
 5 NELNET                                                      147912936. drop  
 6 ACS EDUCATION SERVICES                                       54378216. drop  
 7 AMERICAN EDUCATION SERVICES                                  53497538. drop  
 8 RELAY GRADUATE SCHOOL OF EDUCATION                           41678567. drop  
 9 JOHNS HOPKINS UNIVERSITY                                     39440285. drop  
10 LOYOLA MARYMOUNT UNIVERSITY                                  33253720. drop  

Visualization 1

options(scipen = 50) #This makes it a number you can read instead of a scientific number
plot1 <- ggplot(
state_payment,
aes(x = reorder(STATE, Total_Payments),y = Total_Payments,fill = STATE))+geom_col() +
coord_flip() +
scale_fill_brewer(
palette = "Set3" # Changing the color palette
)+
labs(
title = "Top 10 States Receiving AmeriCorps Education Award Payments",
subtitle = "Variation in payment totals across states",
x = "State",
y = "Total Payments",
fill = "State",
caption = "Source: AmeriCorps Segal Education Award Dataset"
)+
theme_dark() # changing the theme so it isn't the basic one

plot1

Analyze visualization 1

Looking at this graph, we can see which state earned the most Segal Education Award. With number one being PA which is the only one that pass 400000000 out of all the states. With the state being last out of the top 10 being Maryland. Another thing we can see is the big jump between the total payments between each state. With the first one being from TX to Ca and the second one being Wi to GA.

Research Question 2

Can award year, payment type, and country category predict total payments?

model <- lm(TOTAL_PAYMENTS~YEAR+PAYMENT_TYPE+COUNTRY_CATEGORY, # The regression equation
data = americorps_clean
)

summary(model)

Call:
lm(formula = TOTAL_PAYMENTS ~ YEAR + PAYMENT_TYPE + COUNTRY_CATEGORY, 
    data = americorps_clean)

Residuals:
     Min       1Q   Median       3Q      Max 
 -197143   -34584   -15376    -1055 26897783 

Coefficients:
                                    Estimate Std. Error t value
(Intercept)                       -3705854.3   279240.9 -13.271
YEAR                                  1857.6      139.1  13.359
PAYMENT_TYPERepay Loans              32549.0     2376.4  13.697
PAYMENT_TYPEReturns                 -29294.2     3167.8  -9.248
PAYMENT_TYPEUncategorized Payment    -2778.7     7747.8  -0.359
COUNTRY_CATEGORYINTERNATIONAL       -24101.5    10001.1  -2.410
                                             Pr(>|t|)    
(Intercept)                       <0.0000000000000002 ***
YEAR                              <0.0000000000000002 ***
PAYMENT_TYPERepay Loans           <0.0000000000000002 ***
PAYMENT_TYPEReturns               <0.0000000000000002 ***
PAYMENT_TYPEUncategorized Payment               0.720    
COUNTRY_CATEGORYINTERNATIONAL                   0.016 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 321900 on 114181 degrees of freedom
Multiple R-squared:  0.004197,  Adjusted R-squared:  0.004154 
F-statistic: 96.26 on 5 and 114181 DF,  p-value: < 0.00000000000000022

Equation

total payments= -3705854.3 +1857.6(YEAR)+32549.0(PAYMENT_TYPERepay Loans)-29294.2(PAYMENT_TYPEReturns)-2778.7(PAYMENT_TYPEUncategorized Payment)-24101.5(COUNTRY_CATEGORYINTERNATIONAL)

Adjusted R²

summary(model)$adj.r.squared
[1] 0.004153883

Interpretation

After looking at this model, there are some key things to take away. The p-value is 0.00000000000000022 meaning there is a very low chance that the reason why we got these results is due to chance. The degrees of freedom is 114181 which is a very good sign to see as well. And the variables Year, PAYMENT_TYPERepay Loans, PAYMENT_TYPEReturns are signicant to the model and has a very low p-value. And the adjusted R-squared is 0.004154, indicating that approximately 0.42% of the variability in AmeriCorps total payment amounts is explained by the predictor variables included in the model (award year, payment type, and country category).

Diagnostic Plots

par(mfrow=c(2,2)) # The diagnostic plots of the model

plot(model)

Diagnostic interpretation: The regression diagnostic plots suggest that the multiple linear regression model does not fit the data particularly well. In the Residuals vs Fitted plot, the points do not appear randomly scattered around the horizontal line and instead show an increasing spread as fitted values increase. This pattern indicates heteroscedasticity.

The Q–Q Residual plot shows strong deviation from the reference line, especially in the upper tail. This suggests that the residuals are not normally distributed and that extreme outliers may be present within the dataset.

The Scale–Location plot further supports the presence of heteroscedasticity because the spread of residuals increases rather than remaining evenly distributed across fitted values.

The Residuals vs Leverage plot identifies a few observations with relatively high leverage and large residual values. These points may be influential observations that disproportionately affect the regression model and contribute to instability in predictions.

Visualization 2

https://public.tableau.com/views/Totalpaymentsthroughouttheyears/Totalpaymentsthroughouttheyears?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

Analyze visualizaion 2

Looking at the education column at the education expenses, we can see that there is a gradual increase with a small decrease in 2011 and more decrease after 2013. And for the interest payments and others in education expenses, it stays very low for all of the years. But for repay loans, we can see that education and interest payments both go up. However education comes up very high while the highest interest payments reach is around 10,400,000.

Conclusion

This project explored patterns in AmeriCorps education award payments by examining how payments vary across states and institutions and whether variables such as award year, payment type, and country category could predict total payment amounts. The visualizations showed that payment distributions differed substantially across states and institutions, with some locations receiving much larger payment totals than others. These differences may reflect variations in institutional enrollment, the number of participating AmeriCorps members, and regional participation patterns. The multiple linear regression analysis was used to determine whether award year, payment type, and country category were useful predictors of total payment amounts. The model produced an adjusted R^2 value of 0.004154, indicating that approximately 0.42% of the variation in payment amounts was explained by the variables included in the analysis. One interesting finding from the project was that payment amounts were distributed unevenly across states and institutions rather than being spread uniformly.

Future research

In the future, I would like to add even more variables into the model. To see how much it will raise the adjusted R^2 and if it is significant or not. And I want to add more things to the tableau graph as there were certain variables I didn’t add to the graph. As it had too much for tableau could handle and it recommended me to use only 1000. And I do want the dataset to get updated as well due to its most recent update was on November 27, 2023.

References

https://catalog.data.gov/dataset/segal-americorps-education-award-detailed-payments-by-institution-2020

AmeriCorps. Segal AmeriCorps Education Award. AmeriCorps, n.d., https://www.americorps.gov(website is updating currently on 5/14/2026, don’t know when it is done so you can’t use it) AmeriCorps.

What Can the Education Award Be Used For? AmeriCorps, n.d., https://www.americorps.gov(website is updating currently on 5/14/2026, don’t know when it is done so you can’t use it)

https://public.tableau.com/views/Totalpaymentsthroughouttheyears/Totalpaymentsthroughouttheyears?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link