1. Introduction

I used the data, “CollegeScores4yr”, from https://www.lock5stat.com/datapage3e.html.

I proposed the following 10 questions based on my own understanding of the data.

Analysis

I will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the average admission rate of colleges in the dataset?

mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025

On average, approximately 67% of applicants are admitted to U.S. colleges and universities in this dataset.

Q2: What is the median in-state tuition across all colleges?

median(college$TuitionIn, na.rm = TRUE)
## [1] 17662

The median in-state tuition among U.S. colleges is $17,662, meaning that half of the schools charge less and half charge more.

Q3: How much does the average debt vary among students?

sd(college$Debt, na.rm = TRUE)
## [1] 5360.986

The standard deviation in student debt is approximately $5,361, indicating there is substantial variation in how much students borrow across different colleges.

Q4: What does the distribution of average SAT scores look like?

hist(college$AvgSAT,
     col = "blue",
     main = "Distribution of Average SAT Scores",
     xlab = "SAT Score")

The histogram shows that the distribution of average SAT scores across colleges is roughly bell-shaped and slightly right-skewed, with most colleges reporting average SAT scores between 1000 and 1200. Very few colleges have average SAT scores below 800 or above 1400, suggesting that extremely low or high-scoring institutions are rare in this dataset.

Q5: How is family median income distributed among students?

hist(college$MedIncome,
     col = "green",
     main = "Distribution of Median Family Income",
     xlab = "Income (in $1,000s)")

The histogram shows that median family income among college students is right-skewed, with most values concentrated between $20,000 and $60,000.

Q6: How does completion rate vary across regions?

boxplot(CompRate ~ Region,
        data = college,
        col = "orange",
        main = "Completion Rate by Region",
        xlab = "Region",
        ylab = "Completion Rate")

Northeast and Midwest regions have higher median completion rates compared to other regions, with many schools reaching 60–70%.

Q7: Compare faculty salaries between colleges offering bachelor’s vs graduate degrees.

boxplot(FacSalary ~ HighDegree,
        data = college,
        col = "purple",
        main = "Faculty Salary by Highest Degree",
        xlab = "Highest Degree (0–4)",
        ylab = "Faculty Salary ($)")

The boxplot reveals a clear pattern: faculty salaries tend to increase with the highest degree level offered by a college.

Q8: Is there a correlation between admission rate and average SAT score?

cor(college$AdmitRate, college$AvgSAT, use = "complete.obs")
## [1] -0.4221255

This means that colleges with lower admission rates (more selective) tend to have higher average SAT scores among their admitted students. The relationship isn’t perfectly linear, but it does support the idea that more competitive schools tend to attract higher-scoring applicants.

Q9: Is there a relationship between net price and median income?

cor(college$NetPrice, college$MedIncome, use = "complete.obs")
## [1] 0.5151298

This suggests that students from wealthier families tend to attend colleges with higher net prices, possibly due to attending more expensive private or out-of-state schools.

Q10: How many colleges are in each region of the country?

barplot(table(college$Region),
        col = "lightblue",
        main = "Number of Colleges by Region",
        ylab = "Count")

The barplot shows that the Northeast region has the highest number of colleges, followed closely by the Midwest, Southeast, and West regions. The Territory region (which includes U.S. territories like Puerto Rico or Guam) has very few colleges compared to the others.

Summary

This report explored various aspects of U.S. bachelor’s-granting colleges using descriptive statistics and visualizations in R. Through numerical summaries, I found that the average college admission rate is around 67%, and the median in-state tuition cost is $17,662. The standard deviation in student debt highlights substantial variation in borrowing, with an average spread of about $5,361. Visual tools like histograms revealed that most colleges have average SAT scores clustered between 1000–1200, and that the majority of students come from families earning under $75,000 annually. Completion rates varied by region, with Northeast and Midwest schools showing higher medians, while faculty salaries increased with the highest degree offered. A moderate negative correlation between admission rate and SAT score suggests more selective schools tend to admit higher-scoring students. Additionally, net price was positively associated with family income, indicating that higher-income students often attend more expensive institutions. The analysis concludes with a barplot showing that most colleges are located in the Northeast, Midwest, and Southeast, highlighting regional concentrations in U.S. higher education.