Introduction

I propose these 10 questions based on my understanding of the data, CollegeScores4yr: 1. What is the mean admission rate (AdmitRate) across all schools? 2. What is the median undergraduate enrollment (Enrollment)? 3. What is the standard deviation of the in-state tuition (TuitionIn) costs? 4. What is the boxplot of the average combined SAT scores (AvgSAT) among schools? 5. How are debt levels (Debt) distributed among schools? 6. What percentage of schools are main campuses (Main)? 7. Does the percentage of part-time students (PartTime) correlate with the completion rate (CompRate)? 8. What proportion of schools have a high degree type of “graduate”(HighDegree)? 9. Is there a correlation between average family income (MedIncome) and net price (NetPrice)? 10. How is the distribution of undergraduate enrollment (Enrollment) across schools?

Analysis

I will now explore the questions in detail now:

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Question 1: What is the mean admission rate (AdmitRate) across all schools?

mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025

The mean admission rate across all schools is equal to 67% of total number of applicants a year.

Question 2: What is the median undergraduate enrollment (Enrollment)?

median(college$Enrollment, na.rm = TRUE)
## [1] 1722

The median undergraduate enrollment of all schools is equal to 1722.

Question 3: What is the standard deviation of the in-state tuition (TuitionIn) costs?

sd(college$TuitionIn, na.rm = TRUE)
## [1] 14130.3

The standard deviation of the in-state tuition for all schools is equal to $14130.

Question 4: What is the boxplot of the average combined SAT scores (AvgSAT) among schools?

boxplot(college$AvgSAT, main = "Boxplot of Average SAT Scores", ylab = "Average SAT Scores")

This boxplot graph shows us that the average SAT scores are mostly between 1000 and 1200. The median SAT score sits around the 1150 mark telling us that half of the schools have average SAT scores above this.

Question 5: How debt levels (Debt) distributed among schools?

hist(college$Debt, main = "Average Debt Levels", xlab = "Debt", col = "black")

This histogram illustrates how different debt levels are distributed amoung colleges, with majority having debt below $10,000.

Question 6: What percentage of schools are main campuses (Main)?

main_counts <- table(college$Main)
labels <- c("Main Campus", "Branch Campus")
pie(
  x = main_counts, 
  labels = labels, 
  main = "Percentage of Main vs Branch Campuses"
)

This Pie Chart demonstrates that difference between how many campuses are main vs branch campuses. Showing that majority of campuses are branch campuses.

Question 7: Does the percentage of part-time students (PartTime) correlate with the completion rate (CompRate)?

cor(college$PartTime, college$CompRate, use = "complete.obs")
## [1] -0.4190961

The correlation between the percentage of part-time students and completion rate seems to be negative. Telling us that as the percentage of part-time students increases the completion rate with likely decrease.

Question 8: What proportion of schools have a high degree type of “graduate”(HighDegree)?

college$HighDegree <- as.numeric(as.character(college$HighDegree))

college_clean <- college[!is.na(college$HighDegree),]

degree_counts <- table(college_clean$HighDegree)

degree_labels <- c("No Degrees", "Bachelors", "Graduate")

percentage_labels <- paste(degree_labels, "at", round((degree_counts / sum(degree_counts)) * 100, 1), "%")

colors <- c("red", "green", "blue")

pie(
  x = degree_counts, 
  labels = percentage_labels, 
  main = "Proportion of Schools by Highest Degree Type",
  col = colors
)

This pie chart illustrates that Graduate degrees is the highest degree type in the colleges listed in the college dataset.

Question 9: Is there a correlation between average family income (MedIncome) and net price (NetPrice)?

cor(college$MedIncome, college$NetPrice, use = "complete.obs")
## [1] 0.5151298

There seems to be a positive correlation between the average family income and the net prices at colleges. Meaning that higher average family incomes are associated with higher net prices at colleges.

Question 10: How is the distribution of undergraduate enrollment (Enrollment) across schools?

ranges <- cut(college$Enrollment, breaks = c(0, 1000, 5000, 10000, 20000, 50000), right = FALSE)

colors <- c("red", "green", "blue", "yellow", "purple")

counts <- table(ranges)

labels <- c("0-1k", "1k-5k", "5k-10k", "10k-20k", "20k-50k")

barplot(
  height = counts, 
  names.arg = labels, 
  main = "Distribution of Undergraduate Enrollment",
  xlab = "Enrollment Range",
  ylab = "Number of Colleges",
  col = colors,
)

This barplot shows us that majority of colleges have a undergraduate enrollment between 0 and 5,000.