This report investigates ten data-driven questions about colleges using the dataset ‘CollegeScores4yr.csv’ . The focus is on applying descriptive statistical tools such as mean, median, variance, standard deviation, correlation, boxplots, and histograms.
1. What is the average faculty salary at colleges with strong engineering programs?
2. What is the median graduation rate for public vs. private colleges?
3. What’s the variance in student enrollment across different college types?
4. Is there a correlation between average student debt and median earnings after graduation?
5. What’s the average net price for in-state students compared to out-of-state?
6. What is the standard deviation of SAT scores among selective colleges?
7. Do rural colleges tend to have lower admission rates than urban ones?
8. What is the distribution of student-to-faculty ratios across U.S. colleges?
9. What’s the average percent of first-generation college students at community colleges?
10.How does average tuition compare between HBCUs and non-HBCUs?
1. What is the mean net price across all colleges?
2. What is the median admission rate by region?
3. What is the standard deviation of average faculty salary?
4. What is the variance in undergraduate enrollment?
5. What is the correlation between average SAT scores and admission rates?
6. How does tuition (in-state vs out-of-state) compare using a boxplot?
7. What is the distribution (histogram) of completion rates across colleges?
8. What is the average percent of female students by region?
9. What is the correlation between student loan debt and net price?
10.Use a barplot to compare median ACT scores across different regions.
1. What is the mean net price across all colleges?
2. What is the median graduation rate for public vs. private colleges?
3. What is the correlation between average SAT scores and admission rates?
4. What is the variance in undergraduate enrollment?
5. What is the average faculty salary at colleges with strong engineering programs?
6. Use a boxplot to compare in-state vs out-of-state tuition.
7. What is the median earnings after graduation in urban vs rural areas?
8. What is the standard deviation of SAT scores among selective colleges?
9. What is the average percent of female students by region?
10. Use a histogram to show the distribution of completion rates.
------------------------------------------------------------------------
college <- read.csv("CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean(college$NetPrice, na.rm = TRUE)
## [1] 19886.82
tapply(college$CompRate, college$Control, median, na.rm = TRUE)
## Private Profit Public
## 56.240 26.390 48.575
cor(college$AvgSAT, college$AdmitRate, use = "complete.obs")
## [1] -0.4221255
var(college$Enrollment, na.rm = TRUE)
## [1] 55846805
quantile(college$AvgSAT, 0.75, na.rm = TRUE) # Top 25% cutoff
## 75%
## 1198
engineering <- subset(college, AvgSAT >= quantile(college$AvgSAT, 0.75, na.rm = TRUE) & MidACT >= 27)
mean(engineering$FacSalary, na.rm = TRUE)
## [1] 11327.41
boxplot(college$TuitionIn, college$TuitonOut,
names = c("In-State", "Out-of-State"),
col = c("lightgreen", "lightblue"),
main = "Tuition Comparison",
ylab = "Tuition ($)")
tapply(college$MedIncome, college$Locale, median, na.rm = TRUE)
## City Rural Suburb Town
## 39.4 39.7 46.0 47.8
selective <- subset(college, AdmitRate < 0.5)
sd(selective$AvgSAT, na.rm = TRUE)
## [1] 197.1435
tapply(college$Female, college$Region, mean, na.rm = TRUE)
## Midwest Northeast Southeast Territory West
## 58.73613 59.44461 59.47673 56.61702 59.86353
hist(college$CompRate,
main = "Histogram of Completion Rates",
xlab = "Completion Rate (%)",
col = "orange",
border = "white")
In this project, I conducted an exploratory data analysis on U.S. college data using various descriptive statistics and visualization techniques. The goal was to uncover insights into financial, demographic, and academic characteristics of higher education institutions.
Key findings include:
The average net price across all colleges is approximately $19,886, providing a baseline for college affordability.
Graduation rates differ between public and private institutions, with private schools showing slightly higher medians.
There is a moderate negative correlation between average SAT scores and admission rates, indicating that more selective colleges tend to require higher SAT scores.
The variance in enrollment is large, reflecting the diversity in college sizes—from small liberal arts colleges to large state universities.
Colleges with strong engineering indicators (high SAT/ACT) tend to offer higher average faculty salaries, possibly due to demand for specialized faculty.
A boxplot comparison of tuition shows that out-of-state tuition is consistently higher than in-state, with wider variation.
Urban colleges (Cities) report higher median earnings after graduation than those in rural settings, which may reflect location-based job opportunities.
Among selective colleges (admission rates < 50%), SAT scores show more consistency, with a lower standard deviation.
The percentage of female students is fairly consistent across regions, averaging around 58–60%.
The distribution of completion rates is right-skewed, indicating that while many colleges have high completion rates, some lag behind.
Overall, this analysis provided meaningful patterns and comparisons across a variety of college attributes. It demonstrates the power of descriptive statistics to summarize large datasets and uncover trends that can inform students, educators, and policymakers alike