Introduction

In this project we analyze data associated with various 4 year college scores. To better understand the statistical significance of the data collected, I propose the following 10 questions:

(1). What is the mean cost of all of the colleges in the data set?

(2). What is the correlation between the cost of and the average SAT score?

(3). What is the distribution of cost?

(4). What is the average percentage of Pell Grant recipients?

(5). What is the correlation between admission and completion rate?

(6). What is the median of average SAT scores?

(7). What is the variance of median family income across all colleges?

(8). What is the distribution of faculty salaries?

(9). What is the mean faculty salary in each U.S. region?

(10). What is the distribution of median family income?

To further my understanding and to open my mind to some different ideas, I consulted ChatGPT for 10 of its own questions with the same parameters. It proposed the following:

(11). Create a histogram of the admission rate (AdmitRate).

(12). Create a boxplot of average SAT scores (AvgSAT) grouped by region (Region).

(13). Create a barplot comparing the average tuition for in-state (TuitionIn) vs out-of-state (TuitonOut) students.

(14). Create a pie chart showing the proportion of schools by control type (Control).

(15). Is there a correlation between admission rate (AdmitRate) and completion rate (CompRate)?

(16). How strongly does median family income (MedIncome) correlate with average debt (Debt)?

(17). What is the relationship (scatterplot & correlation) between net price (NetPrice) and completion rate (CompRate)?

(18). Do colleges with higher enrollment (Enrollment) tend to have higher or lower admit rates (AdmitRate)?

(19). Compare mean ACT score (MidACT) between online (Online) and in-person colleges.

(20). Create a pie chart showing the distribution of colleges by highest degree offered (HighDegree).

Following ChatGPT’s questions I selected 10 of the most interesting questions to analyze. The questions being: 4, 5, 6, 7, 8, 10, 11, 13, 16, and 18.

Analysis

Q1: What is the average percentage of Pell Grant recipients?

## Average percentage of recipients: 37.85296

The average percentage of Pell Grant recipients is 37.8%. Meaning, among all schools, 37.8% of all applicants received the grant. Resulting in just over a 1 in 3 chance of recieving the grant.

Q2: What is the correlation between admission and completion rate?

## Correlation between admission and completion rate: -0.3482341

A correlative result of -.35 suggests that schools that are more selective (harder to get in to/lower admission rates), tend to have higher rates of completion. Which makes sense. Schools that have high standards for accepting students are more likely to have a student body made up of individuals who are more intelligent or that have strong motivations/work ethic.

Q3: What is the median of average SAT scores?

## Median of average SAT score: 1135.25

An average SAT score of 1135.25 is just below 1200, which is considered a “good” score. Additionally, the majority of SAT takers in this sample scored at or near 1135.25.

Q4: What is the variance of median family income across all colleges?

## Variance of MedIncome: 522.4814
## Standard deviation: 22.85785

The variance of median family income is 522.48. The data provided is expressed in thousands of dollars. Taking the square root of the variance gives a standard deviation of about 22.85. Meaning the average household income across all colleges vary by about $22,850.

Q5: What is the distribution of faculty salaries?

Using the histogram, we can observe that the majority of faculty have a salary between 4,000 and 9,000. With the highest frequency from the amount of staff for an individual range being between 6,000 and 8,000. That is, the highest distribution is between 6,000 and 8,000.

Q6: What is the distribution of median family income?

According to the graph shown above, the primary distribution range for the median income of college households is between 25 and 50 thousand dollars. With the smallest distribution being above 100 thousand dollars. This suggests that the majority of college attenders are of families that are middle-class.

Q7: What is the distribution of admission rates.

The above histogram shows the frequency distribution of admission rates. The highest frequencies are between 60 and 80 percent admission rate. While lower admission rates become less frequent the lower you go. Additionally, admission rates above 80 percent become less and less frequent as well. This suggests that the majority of colleges have admission rates around 60 to 80 percent.

Q8: How does the average tuition for in-state vs out-of-state students compare?

Seen above, the block diagram compares the average tuition cost for in-state students against out-of-state students. We can observe that, on average, the median tuition for out-of-state students is about 2,500 dollars more than thier in-state counterparts. This suggests that it is more expensive, in terms of tuition, to attend school out-of-state.

Q9: How strongly does median family income correlate with average debt?

## Correlation between family income and debt: -0.1207221

-.121 is a weak negative correlation. It means that, on average, colleges with higher median family incomes tend to have slightly lower average student debt. However, the correlation is very weak. This suggests that, while income may matter, additional factors weigh more heavily that the discrepancy of income.

Q10: Do colleges with higher enrollment tend to have higher or lower admit rates?

## Correlation of enrollment and admission rates:  -0.06181281

Based on a correlation of -.062 and the plot shown above, more enrollment shows less admission. However, the relationship is so weak, it’s almost negligible. This suggests that a more selective school doesn’t necessarily have lower or higher admission rates.

Summary

This analysis explored various factors related to four-year college statistics, focusing on cost, admissions, completion, income, debt, and faculty data. Ten key questions were investigated using descriptive statistics, correlations, and visualizations.

  1. Pell Grant Recipients – On average, 37.8% of students received Pell Grants, indicating that roughly one in three students benefit from federal aid.

  2. Admission vs Completion Rate – A moderate negative correlation (-0.35) suggests that more selective colleges (lower admission rates) tend to have higher completion rates.

  3. Average SAT Scores – The mean SAT score was 1135, close to the national average, indicating most schools admit students with mid-range academic performance.

  4. Median Family Income Variance – The variance was 522.48 (in thousands), corresponding to a standard deviation of about $22,850, showing notable income diversity across colleges.

  5. Faculty Salaries – Most faculty salaries fall between $6,000 and $8,000, with fewer extreme values on either end of the scale.

  6. Median Family Income Distribution – Most college-related households have median incomes between $25,000 and $50,000, with higher-income families being less common.

  7. Admission Rate Distribution – The majority of schools have admission rates between 60% and 80%, suggesting that most institutions are moderately selective.

  8. Tuition Comparison – Out-of-state tuition was approximately $2,500 higher on average than in-state tuition, highlighting a consistent financial advantage for local students.

  9. Family Income vs Debt – A weak negative correlation (-0.12) indicates that higher-income families tend to have slightly lower student debt, though the relationship is minimal.

  10. Enrollment vs Admission Rate – The correlation (-0.06) was nearly zero, implying that college size has little to no relationship with selectivity.

Overall, the data set shows that while economic and academic variables are interrelated, many relationships are weak or moderate at best. The strongest observed trend is that selective institutions tend to have higher completion rates, reflecting the impact of student preparedness and institutional rigor. Other patterns, such as those involving cost, income, or debt—suggest broad variability but limited predictive strength between factors.

Appendix

Full Analysis Code


``` r
# Load data
college <- read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7
# Q1
mean(college$Pell, na.rm = TRUE)
## [1] 37.85296
# Q2
cor(college$AdmitRate, college$CompRate, use = "complete.obs")
## [1] -0.3482341
# Q3
median(college$AvgSAT, na.rm = TRUE)
## [1] 1121
# Q4
var(college$MedIncome, na.rm = TRUE)
## [1] 522.4814
# Q5
hist(college$FacSalary, main = "Histogram of Faculty Salaries", xlab = "Faculty Salary", col = "blue")

# Q6
hist(college$MedIncome, main = "Histogram of Median Income", xlab = "Income", col = "red")

# Q7
hist(college$AdmitRate, main = "Histogram of Admission Rate", xlab = "Admission rate", col = "green")

# Q8
avg_in <- mean(college$TuitionIn, na.rm = TRUE)
avg_out <- mean(college$TuitonOut, na.rm = TRUE)
avg_tuition <- c("In-State" = avg_in, "Out-of-State" = avg_out)
barplot(avg_tuition,
        col = c("skyblue", "orange"),
        main = "Average Tuition: In-State vs Out-of-State",
        ylab = "Average Tuition ($)",
        ylim = c(0, max(avg_tuition) * 1.2))

# Q9
cor(college$MedIncome, college$Debt, use = "complete.obs")
## [1] -0.1207221
# Q10
plot(college$Enrollment, college$AdmitRate,
     xlab = "Enrollment", ylab = "Admit Rate",
     main = "Enrollment vs Admit Rate")
abline(lm(AdmitRate ~ Enrollment, data = college), col="red")

cor(college$Enrollment, college$AdmitRate, use="complete.obs")
## [1] -0.06181281