1. Introduction

We use the data from https://www.lock5stat.com/datapage3e.html to answer the following questions.

I propose the following 10 questions based on my own understanding of the data.

Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean of cost for all the colleges in the data?

mean(college$Cost, na.rm = TRUE)
## [1] 34277.31

The mean cost for the colleges in the data is $34,277.31.

Q2: What is the correlation between cost and average SAT score?

cor(college$Cost, college$AvgSAT, use = "complete.obs")
## [1] 0.5373884

The correlation between cost and average SAT score for the colleges in the data is about 0.537.

Q3: What is the distrubution of cost?

hist(college$Cost, main = "Histogram of Cost", xlab = "Cost", col = "red")

The distrubution of cost for the colleges in the data is shown in the histogram above.

Q4: What is the average admission rate for all the colleges in the data?

mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025

The average admission rate for the colleges in the data is about 67%.

Q5: What is the correlation between admission rate and completion rate?

cor(college$AdmitRate, college$CompRate, use = "complete.obs")
## [1] -0.3482341

The correlation between admission rate and completion rate for the colleges in the data is about -0.348.

Q6: What is the average percentage of part-time students?

hist(college$PartTime, main = "Distribution of Part-Time Students", xlab = "Percent Part-Time", col = "blue")

The average percentage of part-time students for the colleges in the data is shown in the histogram above.

Q7: What is the standard deviation of average student debt among all the colleges in the data?

sd(college$Debt, na.rm = TRUE)
## [1] 5360.986

The standard deviation of average student debt among the colleges in the data is about $5,360.99.

Q8: What is the variance in faculty salaries across all the colleges in the data?

var(college$FacSalary, na.rm = TRUE)
## [1] 6568988

The variance in faculty salaries across the colleges in the data is 6568988, which is large.

Q9: Which control type (Public, Private, or Profit) has the highest average tuition cost?

tapply(college$Cost, college$Control, mean, na.rm = TRUE)
##  Private   Profit   Public 
## 41350.33 28861.96 21338.61

The control type that has the highest average tuition cost for the colleges in the data is private at $41,350.33, followed by profit at $28,861.96, and then public at $21,338.61.

Q10: What is the relationship between median family income and average net price?

cor(college$MedIncome, college$NetPrice, use = "complete.obs")
## [1] 0.5151298

The relationship between median family income and average net price for the colleges in the data is about 0.515.

Summary

Exploring Colleges This report analyzes data from the CollegeScores4yr dataset to explore patterns among U.S. four-year colleges. Using statistics such as mean, median, variance, standard deviation, correlation, and histograms, this study examines college costs, admission rates, student characteristics, and outcomes. The goal is to explore how elements such as tuition, test scores, and family income influence college affordability and overall student success. The analysis shows that college costs differ a lot across the U.S. Private schools are usually the most expensive, while public ones are more affordable. Colleges with higher SAT scores often have higher costs and better completion rates. Most schools admit about 65–70% of applicants, and around one-third of students attend part-time. Student debt and faculty pay vary widely between colleges. Students from higher-income families also tend to go to schools with higher net prices. Overall, the results show clear links between cost, selectivity, and student outcomes.