1. Introduction

We use data from over two thousand colleges which includes information about admission rates, tuition costs, ACT/SAT scores, and student demographics.

I propose the following 10 questions based on my own understanding of the data:

  1. What is the mean ACT score for all public colleges?
  2. What is the variance in lattitude for all colleges in the southeast?
  3. What is the mean cost of tuition for all colleges?
  4. What is the standard deviation of ACT scores?
  5. Is there correlation between average ACT scores and completion rates?
  6. Show the distribution of ACT scores for private colleges? (histogram)
  7. What is the correlation between cost and completion rates
  8. Show the distribution of SAT scores? (histogram)
  9. What is the difference between the mean net price of colleges in the west and mean net price of colleges in the midwest?
  10. What is the median tuition for all colleges?

I asked chatGPT to come up with 10 questions relating to the data set

  1. What is the average admission rate by region? (Barplot)
  2. What are the mean and median student debt amounts for students across all institutions?
  3. Is there a relationship between faculty salary and tuition cost?
  4. How does the average completion rate differ between colleges with high and low percentages of first-generation students?
  5. Does the percentage of part-time students impact the completion rate of a college?
  6. How does the distribution of median income vary across different institutions? (Boxplot)
  7. What is the average admission rate by region? (Barplot)
  8. What are the mean and median values of tuition costs for in-state andout-of-state students?
  9. What percentage of institutions offer online programs vs. traditional on-campus education? (Pie Chart)
  10. How does the average percentage of part-time students vary by region? (Barplot)

2. Analysis

Ten of these questions will be examined in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean ACT score for all students attending a public college?

trunc(mean(college$MidACT, na.rm=TRUE))
## [1] 23

The average ACT score considering all college students in the United States is 23.

Q2: What is the difference between the mean net price of colleges in the west and mean net price of colleges in the midwest?

filteredCollege = na.omit(college[, c("Region", "NetPrice")])
west_subset = subset(filteredCollege, Region == "West")
midwest_subset = subset(filteredCollege, Region == "Midwest")

mean(west_subset$NetPrice) - mean(midwest_subset$NetPrice)
## [1] 436.4873

The difference between the net price of colleges in the west and colleges in the midwest is 434.49 USD. This is only a 2.2% increase in net price and I would consider insignificant.

Q3: What is the correlation between cost and completion rates?

filteredCollege = na.omit(college[, c("Cost", "CompRate")])
cor(filteredCollege$Cost, filteredCollege$CompRate)
## [1] 0.5870019

The correlation between cost and completion rate is 0.587, indicating a moderate positive relationship. This suggests that higher costs are generally associated with higher completion rates.

Q4: What are the mean and median values of tuition costs for in-state andout-of-state students?

filteredCollege = na.omit(college[, c("TuitionIn", "TuitonOut")])
mean(filteredCollege$TuitionIn)
## [1] 21948.55
mean(filteredCollege$TuitonOut)
## [1] 25336.66
median(filteredCollege$TuitionIn)
## [1] 17662
median(filteredCollege$TuitonOut)
## [1] 23606

The average tuition and median tuition values for in state students are 21,948.55 USD and 17,662.00 USD respectively. While the average tuition and median tuition for out-state students are 25,336.66 USD and 23,606.00 USD respectively. This indicates that out-state students on average pay more in tuition, and have a larger median tuition cost than in-state students.

Q5: Show the distribution of SAT scores (histogram).

filteredCollege = na.omit(college[, c("AvgSAT")])
hist(filteredCollege,
     main = "Distribution of SAT scores",
     xlab = "SAT score",
     ylab = "frequency",
     col = "lightblue",
     border = "black",
     breaks = 50)

Analyzing this histogram, I can conclude that the most frequent SAT scores lie between 1000 and 1200. Another observation I make is that there is that there seems to be more schools with an average SAT score of near 600 than schools with an average SAT score ranging from 600-800.

Q6: What is the variance in lattitude for all colleges in the Southeast region?

south_data = subset(college, Region == "Southeast")
var(south_data$Latitude, na.rm = TRUE)
## [1] 11.85384

The variance between latitude in colleges from the southeast is 11.854 degrees. This means that the average squared differences of each school’s latitude from the mean latitude is 11.854 degrees.

Q7: Is there correlation between average ACT scores and completion rates?

filteredCollege = na.omit(college[, c("MidACT", "CompRate")])
cor(filteredCollege$MidACT, filteredCollege$CompRate)
## [1] 0.8130616

The correlation between schools’ average ACT scores and their completion rates is 0.813, indicating a strong positive relationship. This suggests that schools with a higher average ACT score generally have higher completion rates.

Q8: Is there a relationship between faculty salary and tuition cost?

filteredCollege = na.omit(college[, c("FacSalary", "Cost")])
cor(filteredCollege$FacSalary, filteredCollege$Cost)
## [1] 0.424201

The correlation between faculty salary and cost is 0.424, indicating a moderate positive relationship. This suggests that the as the factulty’s salaries increase, cost of schooling increases. ### Q9: What is the difference between the average AdmitRate of private schools and the average AdmitRate of public schools.

filteredCollege = na.omit(college[, c("AdmitRate", "Control")])
public_subset = subset(filteredCollege, Control == "Public")
private_subset = subset(filteredCollege, Control == "Private")

mean(public_subset$AdmitRate) - mean(private_subset$AdmitRate)
## [1] 0.05084466

The difference between the average admission rates for public colleges and average admission rates for private colleges is 5%. This means that on average you have a 5% higher chance of being accepted to a public school than a private one.

Q10: What is the mean completion rate for schools where the percentage of female students is greater than 75%?

filteredCollege = na.omit(college[, c("Female", "CompRate")])
female_subset = subset(filteredCollege, Female > 75.0)
mean(female_subset$CompRate)
## [1] 44.14372

The mean completion rate for students at schools where the percentage of studets that are female is greater than 75% is 44.1%, which is about 8% lower than the national mean completion rate.