1. Introduction

We use the data from ….

I propose the following 10 questions based on my own understanding of the data and ChatGPT.

2. Anaylsis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)                   
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean of of cost for all the colleges in the data?

mean(college$Cost, na.rm = TRUE)
## [1] 34277.31

The mean cost is $34,277

Q2: What is the correlation between cost and average SAT score?

cor(college$Cost, college$AvgSAT, use = "complete.obs")
## [1] 0.5373884

The correlation is roughly 53.7%

Q3. What is the distribution of cost?

hist(college$Cost, main = "Histogram of Cost",xlab = "Cost")

The cost is varied between periods of ups and downs

Q4. What is the median of cost for all the colleges in the data?

median(college$Cost, na.rm = TRUE)
## [1] 30699

The median of the cost is $30,699.

Q5. What is the range of cost for all the colleges in the data?

range(college$Cost, na.rm = TRUE)
## [1]  5950 72717

The range varies from $5950 to $72717

Q6. What is the variance of the college cost?

var(college$Cost, na.rm = TRUE)
## [1] 233433900

The variance is roughly 233433900

Q7. What is the mode of the college cost

mode(college$Cost)
## [1] "numeric"

Mode is roughly numeric

Q8. What percentage of students with avg SAT score graduate?

cor(college$AvgSAT, college$CompRate, use = "complete.obs")
## [1] 0.8189495

The graduation rate is 81.89% percent

Q9. What is the range of SAT scores in the dataset?

range(college$AvgSAT,na.rm = TRUE )
## [1]  564 1558

The range of Sat Scores is from 564 to 1558

Q10. What is the acceptance rate of black applicants?

median(college$AdmitRate, college$Black, na.rm = TRUE )
## [1] 0.69505

69.5% of the black students that applied got admitted into college

3. Summary

The results came in as expected, the answers verify the questions and my assumptions. College data is varied, and is often dependent on other factors.