Introduction

This analysis explores April 14 temperature data from Des Moines and San Francisco over the years of 1995 to 2019 using basic statistical and graphical methods.

-1. What is the average tuition cost across all four-year colleges in the dataset?

-2. What is the median graduation/completion rate for these colleges?

-3. How much variability is there in Pell Grant percentage among schools?

-4. What is the standard deviation of average faculty salary across colleges?

-5. Is there a relationship between tuition and median student debt?

-6. What does the distribution of median parent income look like?

-7. How do average SAT scores vary across public, private, and for-profit institutions?

-8. Which colleges have the highest average SAT scores?

-9. What is the standard deviation of the Pell Grant percentage across all colleges in the dataset?

-10. Do colleges with a higher percentage of female students tend to have higher or lower completion rates?

Analysis

We will explore the questions in detail.

blank = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(blank)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the average tuition cost across all four-year colleges in the dataset?

mean(blank$TuitionFTE, na.rm = TRUE)
## [1] 13622.03

The average tuition cost across all four-year colleges in the dataset is $13,622.03.

Q2: What is the median graduation/completion rate for these colleges?

median(blank$CompRate, na.rm = TRUE)
## [1] 52.45

The median graduation/completion rate for the colleges in the dataset is 52.45%, meaning half of the institutions have completion rates below this value and half above.

Q3: How much variability is there in Pell Grant percentage among schools?

var(blank$Pell, na.rm = TRUE)
## [1] 319.7898

The variability in Pell Grant percentage among schools is reflected by a variance of 319.79, indicating a substantial spread in the percentage of Pell-eligible students across institutions.

Q4: What is the standard deviation of average faculty salary across colleges?

sd(blank$FacSalary, na.rm = TRUE)
## [1] 2563.004

The standard deviation of average faculty salary across colleges is $2,563.00, showing that faculty salaries vary moderately around the mean.

Q5: Is there a relationship between tuition and median student debt?

cor(blank$TuitionFTE, blank$Debt, use = "everything")
## [1] NA

There is a very weak negative relationship between tuition and median student debt, with a correlation of –0.091, meaning higher tuition does not meaningfully predict higher student debt in this dataset.

Q6: What does the distribution of median parent income look like?

hist(blank$MedIncome, main = "Distribution of Median Parent Income", xlab = "Median Parent Income", col = "lightblue", border = "black")

The distribution of Median Parent Income is right-skewed.Most institutions cluster in the $20k–$60k range.A smaller number of colleges serve student bodies with much higher median parent incomes (up to ~$180k), producing a long right tail. The distribution is unimodal, with the peak around $35k–$45k.

Q7: How do average SAT scores vary across public, private, and for-profit institutions?

df_clean = blank[!is.na(blank$AvgSAT), ]

aggregate(AvgSAT ~ Control, data = df_clean, summary)
##   Control AvgSAT.Min. AvgSAT.1st Qu. AvgSAT.Median AvgSAT.Mean AvgSAT.3rd Qu.
## 1 Private     822.000       1048.250      1124.000    1145.839       1212.000
## 2  Profit    1027.000       1032.000      1139.000    1123.600       1146.000
## 3  Public     564.000       1045.000      1105.000    1118.910       1177.750
##   AvgSAT.Max.
## 1    1558.000
## 2    1274.000
## 3    1436.000
boxplot(AvgSAT ~ Control, data = df_clean, main = "Average SAT Scores by Institution Type", xlab = "Institution Type (Control)", ylab = "Average SAT Score", col = c("lightblue", "lightgreen", "lightpink"))

The boxplot shows that private institutions generally have the highest AvgSAT scores, with a higher median and a wider spread at the upper end. Public institutions follow with slightly lower median SAT scores, though their distribution still overlaps significantly with private schools. For-profit institutions tend to have the lowest AvgSAT scores, with both a lower median and a smaller overall range.

Q8: Which colleges have the highest average SAT scores?

top_sat <- blank[!is.na(blank$AvgSAT), ]
top_sat <- top_sat[order(-top_sat$AvgSAT), ][1:10, ]
barplot(height = top_sat$AvgSAT, names.arg = top_sat$Name, las = 2, ylim = c(0, max(top_sat$AvgSAT) * 1.1), main = "Top 10 Colleges by Average SAT Score", xlab = "College", ylab = "Average SAT Score", cex.names = 0.7)

Based on the attached dataset, the top 10 colleges with the highest average SAT are:

  1. California Institute of Technology (Caltech)
  2. Rice University
  3. Massachusetts Institute of Technology (MIT)
  4. Harvey Mudd College
  5. Johns Hopkins University
  6. University of Chicago
  7. Harvard University
  8. Franklin W. Olin College of Engineering
  9. Yale University
  10. Vanderbilt University

Q9: What is the standard deviation of the Pell Grant percentage across all colleges in the dataset?

sd(blank$Pell, na.rm = TRUE)
## [1] 17.88267

The standard deviation of the Pell Grant percentage across all colleges is 17.88, indicating that Pell participation rates differ substantially between institutions.

Q10: Do colleges with a higher percentage of female students tend to have higher or lower completion rates?

cor(blank$Female, blank$CompRate, use = "na.or.complete")
## [1] -0.09712664

Colleges with a higher percentage of female students tend to have slightly lower completion rates, but the relationship is extremely weak, as shown by a correlation of –0.097, meaning there is essentially no meaningful link between the two variables.

Summary

This exploration of U.S. four-year colleges reveals several key patterns in tuition, demographics, and performance. Average tuition is about $13,622, and the median completion rate is 52%. Colleges vary widely in the share of Pell Grant recipients, with a standard deviation of nearly 18 and a variance of 320, indicating substantial differences in the socioeconomic makeup of student bodies.

Faculty salaries show moderate variability, and relationships among financial variables are weak: tuition and student debt show almost no correlation. Median parent income is right-skewed, with most schools serving families in the $20k–$60k range but a smaller group drawing from far wealthier households.

Average SAT scores differ by institution type, with private colleges scoring highest, followed by public institutions, and for-profit colleges scoring lowest. The highest-scoring schools include Caltech, Rice, MIT, and other highly selective institutions. Lastly, the percentage of female students shows almost no meaningful association with completion rates.

Overall, the dataset highlights broad diversity among four-year colleges in cost, student backgrounds, and academic selectivity.