In this project we analyze data associated with various 4 year college scores. To better understand the statistical significance of the data collected, I propose the following 10 questions:
(1). What is the mean cost of all of the colleges in the data set?
(2). What is the correlation between the cost of and the average SAT score?
(3). What is the distribution of cost?
(4). What is the average percentage of Pell Grant recipients?
(5). What is the correlation between admission and completion rate?
(6). What is the median of average SAT scores?
(7). What is the variance of median family income across all colleges?
(8). What is the distribution of faculty salaries?
(9). What is the mean faculty salary in each U.S. region?
(10). What is the distribution of median family income?
To further my understanding and to open my mind to some different ideas, I consulted ChatGPT for 10 of its own questions with the same parameters. It proposed the following:
(11). Create a histogram of the admission rate (AdmitRate).
(12). Create a boxplot of average SAT scores (AvgSAT) grouped by region (Region).
(13). Create a barplot comparing the average tuition for in-state (TuitionIn) vs out-of-state (TuitonOut) students.
(14). Create a pie chart showing the proportion of schools by control type (Control).
(15). Is there a correlation between admission rate (AdmitRate) and completion rate (CompRate)?
(16). How strongly does median family income (MedIncome) correlate with average debt (Debt)?
(17). What is the relationship (scatterplot & correlation) between net price (NetPrice) and completion rate (CompRate)?
(18). Do colleges with higher enrollment (Enrollment) tend to have higher or lower admit rates (AdmitRate)?
(19). Compare mean ACT score (MidACT) between online (Online) and in-person colleges.
(20). Create a pie chart showing the distribution of colleges by highest degree offered (HighDegree).
Following ChatGPT’s questions I selected 10 of the most interesting questions to analyze. The questions being: 4, 5, 6, 7, 8, 10, 11, 13, 16, and 18.
## Average percentage of recipients: 37.85296
The average percentage of Pell Grant recipients is 37.8%. Meaning, among all schools, 37.8% of all applicants received the grant. Resulting in just over a 1 in 3 chance of recieving the grant.
## Correlation between admission and completion rate: -0.3482341
A correlative result of -.35 suggests that schools that are more selective (harder to get in to/lower admission rates), tend to have higher rates of completion. Which makes sense. Schools that have high standards for accepting students are more likely to have a student body made up of individuals who are more intelligent or that have strong motivations/work ethic.
## Median of average SAT score: 1135.25
An average SAT score of 1135.25 is just below 1200, which is considered a “good” score. Additionally, the majority of SAT takers in this sample scored at or near 1135.25.
## Variance of MedIncome: 522.4814
## Standard deviation: 22.85785
The variance of median family income is 522.48. The data provided is expressed in thousands of dollars. Taking the square root of the variance gives a standard deviation of about 22.85. Meaning the average household income across all colleges vary by about $22,850.
Using the histogram, we can observe that the majority of faculty have a salary between 4,000 and 9,000. With the highest frequency from the amount of staff for an individual range being between 6,000 and 8,000. That is, the highest distribution is between 6,000 and 8,000.
According to the graph shown above, the primary distribution range for the median income of college households is between 25 and 50 thousand dollars. With the smallest distribution being above 100 thousand dollars. This suggests that the majority of college attenders are of families that are middle-class.
The above histogram shows the frequency distribution of admission rates. The highest frequencies are between 60 and 80 percent admission rate. While lower admission rates become less frequent the lower you go. Additionally, admission rates above 80 percent become less and less frequent as well. This suggests that the majority of colleges have admission rates around 60 to 80 percent.
Seen above, the block diagram compares the average tuition cost for in-state students against out-of-state students. We can observe that, on average, the median tuition for out-of-state students is about 2,500 dollars more than thier in-state counterparts. This suggests that it is more expensive, in terms of tuition, to attend school out-of-state.
## Correlation between family income and debt: -0.1207221
-.121 is a weak negative correlation. It means that, on average, colleges with higher median family incomes tend to have slightly lower average student debt. However, the correlation is very weak. This suggests that, while income may matter, additional factors weigh more heavily that the discrepancy of income.
## Correlation of enrollment and admission rates: -0.06181281
Based on a correlation of -.062 and the plot shown above, more enrollment shows less admission. However, the relationship is so weak, it’s almost negligible. This suggests that a more selective school doesn’t necessarily have lower or higher admission rates.
This analysis explored various factors related to four-year college statistics, focusing on cost, admissions, completion, income, debt, and faculty data. Ten key questions were investigated using descriptive statistics, correlations, and visualizations.
Pell Grant Recipients – On average, 37.8% of students received Pell Grants, indicating that roughly one in three students benefit from federal aid.
Admission vs Completion Rate – A moderate negative correlation (-0.35) suggests that more selective colleges (lower admission rates) tend to have higher completion rates.
Average SAT Scores – The mean SAT score was 1135, close to the national average, indicating most schools admit students with mid-range academic performance.
Median Family Income Variance – The variance was 522.48 (in thousands), corresponding to a standard deviation of about $22,850, showing notable income diversity across colleges.
Faculty Salaries – Most faculty salaries fall between $6,000 and $8,000, with fewer extreme values on either end of the scale.
Median Family Income Distribution – Most college-related households have median incomes between $25,000 and $50,000, with higher-income families being less common.
Admission Rate Distribution – The majority of schools have admission rates between 60% and 80%, suggesting that most institutions are moderately selective.
Tuition Comparison – Out-of-state tuition was approximately $2,500 higher on average than in-state tuition, highlighting a consistent financial advantage for local students.
Family Income vs Debt – A weak negative correlation (-0.12) indicates that higher-income families tend to have slightly lower student debt, though the relationship is minimal.
Enrollment vs Admission Rate – The correlation (-0.06) was nearly zero, implying that college size has little to no relationship with selectivity.
Overall, the data set shows that while economic and academic variables are interrelated, many relationships are weak or moderate at best. The strongest observed trend is that selective institutions tend to have higher completion rates, reflecting the impact of student preparedness and institutional rigor. Other patterns, such as those involving cost, income, or debt—suggest broad variability but limited predictive strength between factors.
Full Analysis Code
``` r
# Load data
college <- read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
# Q1
mean(college$Pell, na.rm = TRUE)
## [1] 37.85296
# Q2
cor(college$AdmitRate, college$CompRate, use = "complete.obs")
## [1] -0.3482341
# Q3
median(college$AvgSAT, na.rm = TRUE)
## [1] 1121
# Q4
var(college$MedIncome, na.rm = TRUE)
## [1] 522.4814
# Q5
hist(college$FacSalary, main = "Histogram of Faculty Salaries", xlab = "Faculty Salary", col = "blue")
# Q6
hist(college$MedIncome, main = "Histogram of Median Income", xlab = "Income", col = "red")
# Q7
hist(college$AdmitRate, main = "Histogram of Admission Rate", xlab = "Admission rate", col = "green")
# Q8
avg_in <- mean(college$TuitionIn, na.rm = TRUE)
avg_out <- mean(college$TuitonOut, na.rm = TRUE)
avg_tuition <- c("In-State" = avg_in, "Out-of-State" = avg_out)
barplot(avg_tuition,
col = c("skyblue", "orange"),
main = "Average Tuition: In-State vs Out-of-State",
ylab = "Average Tuition ($)",
ylim = c(0, max(avg_tuition) * 1.2))
# Q9
cor(college$MedIncome, college$Debt, use = "complete.obs")
## [1] -0.1207221
# Q10
plot(college$Enrollment, college$AdmitRate,
xlab = "Enrollment", ylab = "Admit Rate",
main = "Enrollment vs Admit Rate")
abline(lm(AdmitRate ~ Enrollment, data = college), col="red")
cor(college$Enrollment, college$AdmitRate, use="complete.obs")
## [1] -0.06181281