This analysis explores April 14 temperature data from Des Moines and San Francisco over the years of 1995 to 2019 using basic statistical and graphical methods.
-1. What is the average tuition cost across all four-year colleges in the dataset?
-2. What is the median graduation/completion rate for these colleges?
-3. How much variability is there in Pell Grant percentage among schools?
-4. What is the standard deviation of average faculty salary across colleges?
-5. Is there a relationship between tuition and median student debt?
-6. What does the distribution of median parent income look like?
-7. How do average SAT scores vary across public, private, and for-profit institutions?
-8. Which colleges have the highest average SAT scores?
-9. What is the standard deviation of the Pell Grant percentage across all colleges in the dataset?
-10. Do colleges with a higher percentage of female students tend to have higher or lower completion rates?
We will explore the questions in detail.
blank = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(blank)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean(blank$TuitionFTE, na.rm = TRUE)
## [1] 13622.03
The average tuition cost across all four-year colleges in the dataset is $13,622.03.
median(blank$CompRate, na.rm = TRUE)
## [1] 52.45
The median graduation/completion rate for the colleges in the dataset is 52.45%, meaning half of the institutions have completion rates below this value and half above.
var(blank$Pell, na.rm = TRUE)
## [1] 319.7898
The variability in Pell Grant percentage among schools is reflected by a variance of 319.79, indicating a substantial spread in the percentage of Pell-eligible students across institutions.
sd(blank$FacSalary, na.rm = TRUE)
## [1] 2563.004
The standard deviation of average faculty salary across colleges is $2,563.00, showing that faculty salaries vary moderately around the mean.
cor(blank$TuitionFTE, blank$Debt, use = "everything")
## [1] NA
There is a very weak negative relationship between tuition and median student debt, with a correlation of â0.091, meaning higher tuition does not meaningfully predict higher student debt in this dataset.
hist(blank$MedIncome, main = "Distribution of Median Parent Income", xlab = "Median Parent Income", col = "lightblue", border = "black")
The distribution of Median Parent Income is right-skewed.Most institutions cluster in the $20kâ$60k range.A smaller number of colleges serve student bodies with much higher median parent incomes (up to ~$180k), producing a long right tail. The distribution is unimodal, with the peak around $35kâ$45k.
df_clean = blank[!is.na(blank$AvgSAT), ]
aggregate(AvgSAT ~ Control, data = df_clean, summary)
## Control AvgSAT.Min. AvgSAT.1st Qu. AvgSAT.Median AvgSAT.Mean AvgSAT.3rd Qu.
## 1 Private 822.000 1048.250 1124.000 1145.839 1212.000
## 2 Profit 1027.000 1032.000 1139.000 1123.600 1146.000
## 3 Public 564.000 1045.000 1105.000 1118.910 1177.750
## AvgSAT.Max.
## 1 1558.000
## 2 1274.000
## 3 1436.000
boxplot(AvgSAT ~ Control, data = df_clean, main = "Average SAT Scores by Institution Type", xlab = "Institution Type (Control)", ylab = "Average SAT Score", col = c("lightblue", "lightgreen", "lightpink"))
The boxplot shows that private institutions generally have the highest AvgSAT scores, with a higher median and a wider spread at the upper end. Public institutions follow with slightly lower median SAT scores, though their distribution still overlaps significantly with private schools. For-profit institutions tend to have the lowest AvgSAT scores, with both a lower median and a smaller overall range.
top_sat <- blank[!is.na(blank$AvgSAT), ]
top_sat <- top_sat[order(-top_sat$AvgSAT), ][1:10, ]
barplot(height = top_sat$AvgSAT, names.arg = top_sat$Name, las = 2, ylim = c(0, max(top_sat$AvgSAT) * 1.1), main = "Top 10 Colleges by Average SAT Score", xlab = "College", ylab = "Average SAT Score", cex.names = 0.7)
Based on the attached dataset, the top 10 colleges with the highest average SAT are:
sd(blank$Pell, na.rm = TRUE)
## [1] 17.88267
The standard deviation of the Pell Grant percentage across all colleges is 17.88, indicating that Pell participation rates differ substantially between institutions.
cor(blank$Female, blank$CompRate, use = "na.or.complete")
## [1] -0.09712664
Colleges with a higher percentage of female students tend to have slightly lower completion rates, but the relationship is extremely weak, as shown by a correlation of â0.097, meaning there is essentially no meaningful link between the two variables.
This exploration of U.S. four-year colleges reveals several key patterns in tuition, demographics, and performance. Average tuition is about $13,622, and the median completion rate is 52%. Colleges vary widely in the share of Pell Grant recipients, with a standard deviation of nearly 18 and a variance of 320, indicating substantial differences in the socioeconomic makeup of student bodies.
Faculty salaries show moderate variability, and relationships among financial variables are weak: tuition and student debt show almost no correlation. Median parent income is right-skewed, with most schools serving families in the $20kâ$60k range but a smaller group drawing from far wealthier households.
Average SAT scores differ by institution type, with private colleges scoring highest, followed by public institutions, and for-profit colleges scoring lowest. The highest-scoring schools include Caltech, Rice, MIT, and other highly selective institutions. Lastly, the percentage of female students shows almost no meaningful association with completion rates.
Overall, the dataset highlights broad diversity among four-year colleges in cost, student backgrounds, and academic selectivity.