For this project, we looked at data collected by the US department of Education as part of the College Scorecard Project. The dataset includes information on all colleges and universities that grant 4 year bachelor’s degrees. After looking at the provided data, these are the 10 questions I’ll be exploring:
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
Now that we have our questions, we’re going to analize each one to come up with an asnwer:
q1_mean <- mean(college$AdmitRate, na.rm = TRUE)
q1_median <- median(college$AdmitRate, na.rm = TRUE)
q1_sd <- sd(college$AdmitRate, na.rm = TRUE)
q1_range <- range(college$AdmitRate, na.rm = TRUE)
q1_mean; q1_median; q1_sd; q1_range
## [1] 0.6702025
## [1] 0.69505
## [1] 0.208179
## [1] 0 1
hist(college$AdmitRate, main = "Admission Rate (Distribution)", xlab = "Admission Rate", col = "pink")
Based on the histogram, we can see that the majority of US colleges have an acceptance rate between 60% and 80%. This suggests that most institutions are relatively accessible to applicants, with fewer colleges being very selective. In general, colleges are more likely to have higher acceptance rates than lower ones. Out of curiosity, I looked up SCSU and found that our acceptance rate is around 95%, which aligns with this overall trend.
q2_var <- var(college$Cost, na.rm = TRUE)
q2_sd <- sd(college$Cost, na.rm = TRUE)
q2_rng <- range(college$Cost, na.rm = TRUE)
q2_iqr <- IQR(college$Cost, na.rm = TRUE)
q2_var; q2_sd; q2_rng; q2_iqr
## [1] 233433900
## [1] 15278.54
## [1] 5950 72717
## [1] 23519.25
hist(college$Cost, main = "Cost", xlab = "Cost", col = "blue")
The total cost of attending a 4 year college varies widely across the US. Most colleges fall within a moderate range towards the lower end of the spectrum, but a few have much higher costs as shown by graph above. This indicates that while college can be affordable for many, some institutions charge significantly more.
q3_summary <- summary(college$CompRate)
q3_summary
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 38.18 52.45 52.14 66.67 100.00 167
boxplot(college$CompRate, horizontal = TRUE, main = "Completion Rate — Boxplot", col = "purple")
Looking at the above boxplot, we can see that the completion rate among 4 year colleges ranges from 0% to 100%, with most schools graduating about 40–70% of their students. The median completion rate is around 52%, indicating that half of all institutions graduate at least half of their students. The boxplot shows a fairly symmetrical distribution, suggesting that completion rates are evenly spread across institutions, with no extreme outliers.
q4_mean <- mean(college$Debt, na.rm = TRUE)
q4_median <- median(college$Debt, na.rm = TRUE)
q4_sd <- sd(college$Debt, na.rm = TRUE)
q4_rng <- range(college$Debt, na.rm = TRUE)
q4_mean; q4_median; q4_sd; q4_rng
## [1] 2365.655
## [1] 713.5
## [1] 5360.986
## [1] 10 48216
boxplot(college$Debt, horizontal = TRUE, main = "Debt — Boxplot", col = "green")
The boxplot for average student debt shows a strong right-skewed distribution. This entails that while most students graduate with relatively low debt, a few institutions have students with extremely high loan balances. The mean debt $2,366, is much higher than the median, $714, further confirming this skew. The range of debt spans from $10 to over $48,000, highlighting how financial burdens can differ drastically between colleges.
plot(college$AdmitRate, college$AvgSAT,
xlab = "Admission Rate", ylab = "Average SAT Scores",
main = "Admission Rate vs Average SAT Scores")
q5_cor <- cor(college$AdmitRate, college$AvgSAT, use = "complete.obs")
q5_cor
## [1] -0.4221255
The scatterplot shows a moderate negative correlation (r = –0.42) between admission rate and average SAT scores. This means that as colleges become more selective, their average SAT scores tend to be higher. In contrast, schools with high acceptance rates typically have lower SAT averages. This relationship aligns with expectations becasue private/Ivy league colleges have lower acceptance rates and higher SAT score requirements for admission.
plot(college$MedIncome, college$NetPrice,
xlab = "Median Family Income (in $1,000)",
ylab = "Net Price",
main = "Median Income vs Net Price")
q6_cor <- cor(college$MedIncome, college$NetPrice, use = "complete.obs")
q6_cor
## [1] 0.5151298
The scatterplot shows a moderate positive correlation (r = 0.52) between median family income and net price. This suggests that colleges with students from higher-income families tend to have higher net prices, while those serving lower-income populations are generally more affordable. Although the data show some variation, the upward trend indicates that family income and college cost are meaningfully related.
boxplot(NetPrice ~ Control, data = college,
main = "Net Price by Control",
xlab = "Control", ylab = "NetPrice")
q7_means <- tapply(college$NetPrice, college$Control, mean, na.rm = TRUE)
q7_medians <- tapply(college$NetPrice, college$Control, median, na.rm = TRUE)
q7_means; q7_medians
## Private Profit Public
## 22259.02 23309.99 14295.06
## Private Profit Public
## 21836 23179 14376
The boxplot comparing net price by control type shows that public colleges have the lowest overall costs, with a median net price of about $14,000, while private and for-profit institutions are considerably more expensive, averaging around $22,000–$23,000. The spread of data is wider for private and for-profit schools, indicating large differences in pricing within those groups. This suggests that public colleges are generally more affordable and consistent in cost, while private and for-profit institutions vary more and tend to charge higher prices.
q8_mean <- mean(college$Female, na.rm = TRUE)
q8_sd <- sd(college$Female, na.rm = TRUE)
q8_rng <- range(college$Female, na.rm = TRUE)
q8_mean; q8_sd; q8_rng
## [1] 59.29588
## [1] 12.34421
## [1] 11.8 98.0
hist(college$Female, main = "Female Percentage", xlab = "Female (%)", col = "pink")
The histogram shows that most 4 year colleges have between 55% and 65% female students, indicating a slight majority of women in higher education. The distribution is fairly symmetrical, suggesting that gender representation is consistent across most institutions. Very few schools have extremely high or low female enrollment, meaning most colleges maintain a relatively balanced student population
tuition_diff <- college$TuitonOut - college$TuitionIn
q9_mean <- mean(tuition_diff, na.rm = TRUE)
q9_median <- median(tuition_diff, na.rm = TRUE)
q9_sd <- sd(tuition_diff, na.rm = TRUE)
q9_rng <- range(tuition_diff, na.rm = TRUE)
q9_mean; q9_median; q9_sd; q9_rng
## [1] 3388.112
## [1] 0
## [1] 6081.32
## [1] 0 32650
hist(tuition_diff, main = "Tuition Difference", xlab = "Out - In", col = "brown")
The histogram above shows that most colleges have little or no gap
between in state and out of state tuition, but a few institutions charge
non residents significantly more. The median difference is about $6,000,
with some colleges charging up to $32,650 more for out of state
students. This strong right skewed distribution suggests that while most
schools keep tuition consistent, some universities have extremely high
tuition rates for non residents. ### Q10: Is there a correlation between
faculty salary and completion rate?
plot(college$FacSalary, college$CompRate,
xlab = "Faculty Salary (monthly)",
ylab = "Completion Rate",
main = "Faculty Salary vs Completion Rate")
q10_cor <- cor(college$FacSalary, college$CompRate, use = "complete.obs")
q10_cor
## [1] 0.577221
The scatterplot shows a moderate to strong positive correlation (r = 0.58) between faculty salary and completion rate. Colleges that pay their faculty more tend to have higher student success rates. This suggests that institutions with better resources attract experienced faculty and provide stronger academic support to students. While correlation does not imply causation, the trend suggests that investment in faculty may contribute to improved student outcomes.
After analyzing the CollegeScores4yr dataset, I noticed a few clear patterns about colleges in the US. Most schools have acceptance rates between 60%–80%, meaning they’re fairly open to applicants. Public universities also tend to be the most affordable, while private and for-profit schools cost noticeably more and vary a lot in price. The median completion rate is around 52%, so about half of all students graduate on time, and while student debt is usually moderate, there are a few schools where the average debt is really high. I also found that more selective schools usually have higher SAT scores, and colleges in wealthier areas tend to have higher net prices. Most schools have between 55%–65% female students, showing a slight majority of women. When comparing in state and out of state tuition, most schools don’t have a big difference, but some charge non-residents a lot more. Finally, colleges with higher faculty salaries also tend to have better student completion rates. Overall, these trends show that cost, selectivity, and resources can really impact student outcomes across different types of institutions.