1. Introduction

We use the data from https://www.lock5stat.com/datapage3e.html

I propose the following 10 questions based on my own understanding of the college scores 4yr data.

2. Analysis

We will explore the questions in detail

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean of cost for all the colleges in the data?

mean(college$Cost, na.rm = TRUE)
## [1] 34277.31

The mean cost is $34277.31

Q2. What is the correlation between cost and avgerage SAT score?

cor(college$Cost, college$AvgSAT, use = "complete.obs")
## [1] 0.5373884

The correlation between cost and average SAT score is 0.5373884.

Q3. What is the distribution of cost?

hist(college$Cost, main = "Histogram of Cost", xlab = "Cost", col = "red")

The distribution of cost is shown above in the histogram, Figure 1.

Q4. What is the correlation between median income and average SAT score?

cor(college$MedIncome, college$AvgSAT, use = "complete.obs")
## [1] 0.583898

The correlation between median income and average SAT score is 0.583898.

Q5. What is the standard deviation of net cost?

sd(college$NetPrice, na.rm = TRUE)
## [1] 7854.096

The standard deviation of net cost is $7854.096.

Q6. What is the mean female student percentage?

mean(college$Female, na.rm = TRUE)
## [1] 59.29588

The mean female student percentage is 59.29588%.

Q7. What is the correlation between faculty salary and completion rate?

cor(college$FacSalary, college$CompRate, use = "complete.obs")
## [1] 0.577221

The correlation between faculty salary and completion rate is 0.577221.

Q8. What is the mean completion rate?

mean(college$CompRate, na.rm = TRUE)
## [1] 52.13524

The mean completion rate is 52.13524%.

Q9. What is the average admission rate (AdmitRate)?

mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025

The average admission rate is 67.02025%.

Q10.What is the correlation between online only and completion rate?

cor(college$Online, college$CompRate, use = "complete.obs")
## [1] -0.08915002

The correlation of online only learning to completion rate is -0.08915002.

Summary

This study explores various statistical properties of the College Scores 4yr dataset, available at Lock5Stat. The analysis focuses on data related to cost, admission rates, faculty salaries, and student demographics.

Key findings include:

The mean cost of college is $34,277.31.

Cost and average SAT scores have a moderate positive correlation (0.537), suggesting that higher-cost institutions tend to attract students with higher SAT scores.

The distribution of costs is visualized using a histogram.

Median income and average SAT scores are positively correlated (0.584), indicating a relationship between family income and student performance.

The standard deviation of net cost is $7,854.10, reflecting variation in financial burden across institutions.

On average, 59.3% of students are female.

Faculty salary and completion rate show a strong positive correlation (0.577), suggesting better-paid faculty may contribute to higher graduation rates.

The mean completion rate across colleges is 52.1%.

The average admission rate is 67%, indicating that most institutions are moderately selective.

Online-only institutions show a weak negative correlation with completion rate (-0.089), hinting at potential challenges for online students.

This analysis provides insights into how different factors, such as cost, income, and institutional characteristics, relate to student outcomes.