I used the data, “CollegeScores4yr”, from https://www.lock5stat.com/datapage3e.html.
I proposed the following 10 questions based on my own understanding of the data.
I will explore the questions in detail.
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025
On average, approximately 67% of applicants are admitted to U.S. colleges and universities in this dataset.
median(college$TuitionIn, na.rm = TRUE)
## [1] 17662
The median in-state tuition among U.S. colleges is $17,662, meaning that half of the schools charge less and half charge more.
sd(college$Debt, na.rm = TRUE)
## [1] 5360.986
The standard deviation in student debt is approximately $5,361, indicating there is substantial variation in how much students borrow across different colleges.
hist(college$AvgSAT,
col = "blue",
main = "Distribution of Average SAT Scores",
xlab = "SAT Score")
The histogram shows that the distribution of average SAT scores across
colleges is roughly bell-shaped and slightly right-skewed, with most
colleges reporting average SAT scores between 1000 and 1200. Very few
colleges have average SAT scores below 800 or above 1400, suggesting
that extremely low or high-scoring institutions are rare in this
dataset.
hist(college$MedIncome,
col = "green",
main = "Distribution of Median Family Income",
xlab = "Income (in $1,000s)")
The histogram shows that median family income among college students is
right-skewed, with most values concentrated between $20,000 and
$60,000.
boxplot(CompRate ~ Region,
data = college,
col = "orange",
main = "Completion Rate by Region",
xlab = "Region",
ylab = "Completion Rate")
Northeast and Midwest regions have higher median completion rates
compared to other regions, with many schools reaching 60–70%.
boxplot(FacSalary ~ HighDegree,
data = college,
col = "purple",
main = "Faculty Salary by Highest Degree",
xlab = "Highest Degree (0–4)",
ylab = "Faculty Salary ($)")
The boxplot reveals a clear pattern: faculty salaries tend to increase
with the highest degree level offered by a college.
cor(college$AdmitRate, college$AvgSAT, use = "complete.obs")
## [1] -0.4221255
This means that colleges with lower admission rates (more selective) tend to have higher average SAT scores among their admitted students. The relationship isn’t perfectly linear, but it does support the idea that more competitive schools tend to attract higher-scoring applicants.
cor(college$NetPrice, college$MedIncome, use = "complete.obs")
## [1] 0.5151298
This suggests that students from wealthier families tend to attend colleges with higher net prices, possibly due to attending more expensive private or out-of-state schools.
barplot(table(college$Region),
col = "lightblue",
main = "Number of Colleges by Region",
ylab = "Count")
The barplot shows that the Northeast region has the highest number of
colleges, followed closely by the Midwest, Southeast, and West regions.
The Territory region (which includes U.S. territories like Puerto Rico
or Guam) has very few colleges compared to the others.
This report explored various aspects of U.S. bachelor’s-granting colleges using descriptive statistics and visualizations in R. Through numerical summaries, I found that the average college admission rate is around 67%, and the median in-state tuition cost is $17,662. The standard deviation in student debt highlights substantial variation in borrowing, with an average spread of about $5,361. Visual tools like histograms revealed that most colleges have average SAT scores clustered between 1000–1200, and that the majority of students come from families earning under $75,000 annually. Completion rates varied by region, with Northeast and Midwest schools showing higher medians, while faculty salaries increased with the highest degree offered. A moderate negative correlation between admission rate and SAT score suggests more selective schools tend to admit higher-scoring students. Additionally, net price was positively associated with family income, indicating that higher-income students often attend more expensive institutions. The analysis concludes with a barplot showing that most colleges are located in the Northeast, Midwest, and Southeast, highlighting regional concentrations in U.S. higher education.