library(tidyverse) # Load all the libraries that I will need or might use
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(highcharter)Final project data 110
Source: https://coredo.eu/dictionary/payment-institution/
Introduction
This project explores AmeriCorps Segal Education Award payments made to colleges and universities across the United States and international campuses. The dataset used in this analysis is titled “Segal AmeriCorps Education Award Detailed Payments by Institution 2020” and can be founded in https://catalog.data.gov/dataset. The data was collected and published by AmeriCorps, a United States federal agency focused on national service and volunteerism. And there is no readme for the file. The dataset contains information about institutions that received AmeriCorps education award payments, including payment amounts, institution names, states, campuses, award types, and payment categories. I selected this dataset because education affordability and financial assistance programs are important topics for students. Since many college students rely on financial aid and educational support, I wanted to better understand how AmeriCorps education award money is distributed across institutions and states.
Research question: 1.How do AmeriCorps education award payments vary across states and institutions? 2.Can variables such as award year, payment type, and country category predict total payment amounts?
The dataset includes the following important variables:
INSTITUTION – Name of the college or university (categorical) SYSTEM – University system associated with the institution (categorical) STATE – State abbreviation where payments were sent (categorical) COUNTRY_CATEGORY – Domestic or international category (categorical) YEAR – Year the award was earned (quantitative) EARNED_AWARD_TYPE – Type of award earned (categorical) PAYMENT_TYPE – Type of payment made (categorical) TOTAL_PAYMENTS – Dollar amount paid to the institution (quantitative)
Background info
The AmeriCorps program is a national service initiative in the United States that promotes community involvement while providing educational and financial support to participants. One of its most important benefits is the Segal AmeriCorps Education Award, which members earn after successfully completing a term of service. The award can be used to pay for higher education expenses or to repay qualified student loans. The value of a full-time award is connected to the maximum Federal Pell Grant amount for the year in which the service is completed, causing award values to change over time (AmeriCorps). AmeriCorps reports that members have earned billions of dollars in educational awards since the program began. These awards help reduce financial burdens for participants while also directing substantial payments toward colleges and universities where members choose to use their awards (AmeriCorps).
Additionally, the Segal Education Award may be used in several different ways, including tuition payments and qualified student loan repayment. Payment amounts are not always identical because they can vary according to service year, payment type, and institutional factors. Studying these payment patterns may help identify whether characteristics such as award year, payment type, and country category contribute to predicting payment amounts across institutions.
setwd("C:/Users/kenne/Downloads") # Set the working directory
americorps<- read.csv("Segal_AmeriCorps_Education_Award__Detailed_Payments_by_Institution_2020.csv") # Put the dataset into a variable for codingEDA
americorps_clean <- americorps |> #cleaning the dataset, removing all of the nas of the variables I will be using
filter(
!is.na(STATE),
!is.na(INSTITUTION),
!is.na(TOTAL_PAYMENTS),
!is.na(YEAR),
!is.na(PAYMENT_TYPE),
!is.na(COUNTRY_CATEGORY)
)americorps_clean <- americorps_clean |> # Mutating the variable so I won't run into any issues in later coding.
mutate(STATE = factor(STATE),
INSTITUTION = factor(INSTITUTION),
PAYMENT_TYPE = factor(PAYMENT_TYPE),
COUNTRY_CATEGORY = factor(COUNTRY_CATEGORY),
YEAR = as.numeric(YEAR) # Turn into a number for coding and the graph
)summary(americorps_clean) # Gives me the details of the variables of how many there are and what type pf class and mode they are. INSTITUTION SYSTEM CAMPUS
UNIVERSITY OF WISCONSIN : 935 Length:114187 Length:114187
UNIVERSITY OF CALIFORNIA : 930 Class :character Class :character
CALIFORNIA STATE UNIVERSITY: 680 Mode :character Mode :character
UNIVERSITY OF TEXAS : 422
NAVIENT : 394
HARVARD UNIVERSITY : 345
(Other) :110481
COLLEGE STATE COUNTRY_CATEGORY YEAR
Length:114187 CA :10226 DOMESTIC :113136 Min. :1995
Class :character NY : 7996 INTERNATIONAL: 1051 1st Qu.:2002
Mode :character PA : 5893 Median :2008
TX : 5666 Mean :2008
MA : 4490 3rd Qu.:2014
IL : 4374 Max. :2020
(Other):75542
EARNED_AWARD_TYPE PAYMENT_TYPE TOTAL_PAYMENTS
Length:114187 Education Expenses :75896 Min. : -186487
Class :character Repay Loans :24461 1st Qu.: 906
Mode :character Returns :12002 Median : 3676
Uncategorized Payment: 1828 Mean : 27616
3rd Qu.: 11813
Max. :26961959
Research Question 1
How do AmeriCorps payments vary across states and institutions?
state_payment <- americorps_clean |> # Looking for payments from state
group_by(STATE) |>
summarise(
Total_Payments = sum(TOTAL_PAYMENTS),
Average_Payment = mean(TOTAL_PAYMENTS),
Count = n(),
groups = "drop"
) |>
arrange(desc(Total_Payments)) |> # The highest number first and then descending
slice(1:10) # only want top 10 states in terms of payment
state_payment# A tibble: 10 × 5
STATE Total_Payments Average_Payment Count groups
<fct> <dbl> <dbl> <int> <chr>
1 PA 423676144. 71895. 5893 drop
2 NY 397808433. 49751. 7996 drop
3 GA 387688074. 133180. 2911 drop
4 WI 245423791. 94831. 2588 drop
5 CA 231279433. 22617. 10226 drop
6 TX 100753633. 17782. 5666 drop
7 MA 85114889. 18957. 4490 drop
8 IL 78102887. 17856. 4374 drop
9 WA 74958219. 23308. 3216 drop
10 MD 69474647. 32239. 2155 drop
institution_payment <- americorps_clean |> # Similar to the code on top, but institution instead of states
group_by(INSTITUTION) |>
summarise(
Total_Payments = sum(TOTAL_PAYMENTS),
groups = "drop"
) |>
arrange(desc(Total_Payments)) |>
slice(1:10)
institution_payment# A tibble: 10 × 3
INSTITUTION Total_Payments groups
<fct> <dbl> <chr>
1 UNITED STATES DEPARTMENT OF EDUCATION 250293353. drop
2 NAVIENT 210641458. drop
3 GREAT LAKES HIGHER EDUCATION CORPORATION AND AFFILIATES 205718331. drop
4 FEDLOAN SERVICING 160648767. drop
5 NELNET 147912936. drop
6 ACS EDUCATION SERVICES 54378216. drop
7 AMERICAN EDUCATION SERVICES 53497538. drop
8 RELAY GRADUATE SCHOOL OF EDUCATION 41678567. drop
9 JOHNS HOPKINS UNIVERSITY 39440285. drop
10 LOYOLA MARYMOUNT UNIVERSITY 33253720. drop
Visualization 1
options(scipen = 50) #This makes it a number you can read instead of a scientific number
plot1 <- ggplot(
state_payment,
aes(x = reorder(STATE, Total_Payments),y = Total_Payments,fill = STATE))+geom_col() +
coord_flip() +
scale_fill_brewer(
palette = "Set3" # Changing the color palette
)+
labs(
title = "Top 10 States Receiving AmeriCorps Education Award Payments",
subtitle = "Variation in payment totals across states",
x = "State",
y = "Total Payments",
fill = "State",
caption = "Source: AmeriCorps Segal Education Award Dataset"
)+
theme_dark() # changing the theme so it isn't the basic one
plot1Analyze visualization 1
Looking at this graph, we can see which state earned the most Segal Education Award. With number one being PA which is the only one that pass 400000000 out of all the states. With the state being last out of the top 10 being Maryland. Another thing we can see is the big jump between the total payments between each state. With the first one being from TX to Ca and the second one being Wi to GA.
Research Question 2
Can award year, payment type, and country category predict total payments?
model <- lm(TOTAL_PAYMENTS~YEAR+PAYMENT_TYPE+COUNTRY_CATEGORY, # The regression equation
data = americorps_clean
)
summary(model)
Call:
lm(formula = TOTAL_PAYMENTS ~ YEAR + PAYMENT_TYPE + COUNTRY_CATEGORY,
data = americorps_clean)
Residuals:
Min 1Q Median 3Q Max
-197143 -34584 -15376 -1055 26897783
Coefficients:
Estimate Std. Error t value
(Intercept) -3705854.3 279240.9 -13.271
YEAR 1857.6 139.1 13.359
PAYMENT_TYPERepay Loans 32549.0 2376.4 13.697
PAYMENT_TYPEReturns -29294.2 3167.8 -9.248
PAYMENT_TYPEUncategorized Payment -2778.7 7747.8 -0.359
COUNTRY_CATEGORYINTERNATIONAL -24101.5 10001.1 -2.410
Pr(>|t|)
(Intercept) <0.0000000000000002 ***
YEAR <0.0000000000000002 ***
PAYMENT_TYPERepay Loans <0.0000000000000002 ***
PAYMENT_TYPEReturns <0.0000000000000002 ***
PAYMENT_TYPEUncategorized Payment 0.720
COUNTRY_CATEGORYINTERNATIONAL 0.016 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 321900 on 114181 degrees of freedom
Multiple R-squared: 0.004197, Adjusted R-squared: 0.004154
F-statistic: 96.26 on 5 and 114181 DF, p-value: < 0.00000000000000022
Equation
total payments= -3705854.3 +1857.6(YEAR)+32549.0(PAYMENT_TYPERepay Loans)-29294.2(PAYMENT_TYPEReturns)-2778.7(PAYMENT_TYPEUncategorized Payment)-24101.5(COUNTRY_CATEGORYINTERNATIONAL)
Adjusted R²
summary(model)$adj.r.squared[1] 0.004153883
Interpretation
After looking at this model, there are some key things to take away. The p-value is 0.00000000000000022 meaning there is a very low chance that the reason why we got these results is due to chance. The degrees of freedom is 114181 which is a very good sign to see as well. And the variables Year, PAYMENT_TYPERepay Loans, PAYMENT_TYPEReturns are signicant to the model and has a very low p-value. And the adjusted R-squared is 0.004154, indicating that approximately 0.42% of the variability in AmeriCorps total payment amounts is explained by the predictor variables included in the model (award year, payment type, and country category).
Diagnostic Plots
par(mfrow=c(2,2)) # The diagnostic plots of the model
plot(model)Diagnostic interpretation: The regression diagnostic plots suggest that the multiple linear regression model does not fit the data particularly well. In the Residuals vs Fitted plot, the points do not appear randomly scattered around the horizontal line and instead show an increasing spread as fitted values increase. This pattern indicates heteroscedasticity.
The Q–Q Residual plot shows strong deviation from the reference line, especially in the upper tail. This suggests that the residuals are not normally distributed and that extreme outliers may be present within the dataset.
The Scale–Location plot further supports the presence of heteroscedasticity because the spread of residuals increases rather than remaining evenly distributed across fitted values.
The Residuals vs Leverage plot identifies a few observations with relatively high leverage and large residual values. These points may be influential observations that disproportionately affect the regression model and contribute to instability in predictions.
Visualization 2
https://public.tableau.com/views/Totalpaymentsthroughouttheyears/Totalpaymentsthroughouttheyears?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link
Analyze visualizaion 2
Looking at the education column at the education expenses, we can see that there is a gradual increase with a small decrease in 2011 and more decrease after 2013. And for the interest payments and others in education expenses, it stays very low for all of the years. But for repay loans, we can see that education and interest payments both go up. However education comes up very high while the highest interest payments reach is around 10,400,000.
Conclusion
This project explored patterns in AmeriCorps education award payments by examining how payments vary across states and institutions and whether variables such as award year, payment type, and country category could predict total payment amounts. The visualizations showed that payment distributions differed substantially across states and institutions, with some locations receiving much larger payment totals than others. These differences may reflect variations in institutional enrollment, the number of participating AmeriCorps members, and regional participation patterns. The multiple linear regression analysis was used to determine whether award year, payment type, and country category were useful predictors of total payment amounts. The model produced an adjusted R^2 value of 0.004154, indicating that approximately 0.42% of the variation in payment amounts was explained by the variables included in the analysis. One interesting finding from the project was that payment amounts were distributed unevenly across states and institutions rather than being spread uniformly.
Future research
In the future, I would like to add even more variables into the model. To see how much it will raise the adjusted R^2 and if it is significant or not. And I want to add more things to the tableau graph as there were certain variables I didn’t add to the graph. As it had too much for tableau could handle and it recommended me to use only 1000. And I do want the dataset to get updated as well due to its most recent update was on November 27, 2023.
References
https://catalog.data.gov/dataset/segal-americorps-education-award-detailed-payments-by-institution-2020
AmeriCorps. Segal AmeriCorps Education Award. AmeriCorps, n.d., https://www.americorps.gov(website is updating currently on 5/14/2026, don’t know when it is done so you can’t use it) AmeriCorps.
What Can the Education Award Be Used For? AmeriCorps, n.d., https://www.americorps.gov(website is updating currently on 5/14/2026, don’t know when it is done so you can’t use it)
https://public.tableau.com/views/Totalpaymentsthroughouttheyears/Totalpaymentsthroughouttheyears?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link