I propose these 10 questions based on my understanding of the data, CollegeScores4yr: 1. What is the mean admission rate (AdmitRate) across all schools? 2. What is the median undergraduate enrollment (Enrollment)? 3. What is the standard deviation of the in-state tuition (TuitionIn) costs? 4. What is the boxplot of the average combined SAT scores (AvgSAT) among schools? 5. How are debt levels (Debt) distributed among schools? 6. What percentage of schools are main campuses (Main)? 7. Does the percentage of part-time students (PartTime) correlate with the completion rate (CompRate)? 8. What proportion of schools have a high degree type of “graduate”(HighDegree)? 9. Is there a correlation between average family income (MedIncome) and net price (NetPrice)? 10. How is the distribution of undergraduate enrollment (Enrollment) across schools?
I will now explore the questions in detail now:
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025
The mean admission rate across all schools is equal to 67% of total number of applicants a year.
median(college$Enrollment, na.rm = TRUE)
## [1] 1722
The median undergraduate enrollment of all schools is equal to 1722.
sd(college$TuitionIn, na.rm = TRUE)
## [1] 14130.3
The standard deviation of the in-state tuition for all schools is equal to $14130.
boxplot(college$AvgSAT, main = "Boxplot of Average SAT Scores", ylab = "Average SAT Scores")
This boxplot graph shows us that the average SAT scores are mostly
between 1000 and 1200. The median SAT score sits around the 1150 mark
telling us that half of the schools have average SAT scores above
this.
hist(college$Debt, main = "Average Debt Levels", xlab = "Debt", col = "black")
This histogram illustrates how different debt levels are distributed
amoung colleges, with majority having debt below $10,000.
main_counts <- table(college$Main)
labels <- c("Main Campus", "Branch Campus")
pie(
x = main_counts,
labels = labels,
main = "Percentage of Main vs Branch Campuses"
)
This Pie Chart demonstrates that difference between how many campuses
are main vs branch campuses. Showing that majority of campuses are
branch campuses.
cor(college$PartTime, college$CompRate, use = "complete.obs")
## [1] -0.4190961
The correlation between the percentage of part-time students and completion rate seems to be negative. Telling us that as the percentage of part-time students increases the completion rate with likely decrease.
college$HighDegree <- as.numeric(as.character(college$HighDegree))
college_clean <- college[!is.na(college$HighDegree),]
degree_counts <- table(college_clean$HighDegree)
degree_labels <- c("No Degrees", "Bachelors", "Graduate")
percentage_labels <- paste(degree_labels, "at", round((degree_counts / sum(degree_counts)) * 100, 1), "%")
colors <- c("red", "green", "blue")
pie(
x = degree_counts,
labels = percentage_labels,
main = "Proportion of Schools by Highest Degree Type",
col = colors
)
This pie chart illustrates that Graduate degrees is the highest degree
type in the colleges listed in the college dataset.
cor(college$MedIncome, college$NetPrice, use = "complete.obs")
## [1] 0.5151298
There seems to be a positive correlation between the average family income and the net prices at colleges. Meaning that higher average family incomes are associated with higher net prices at colleges.
ranges <- cut(college$Enrollment, breaks = c(0, 1000, 5000, 10000, 20000, 50000), right = FALSE)
colors <- c("red", "green", "blue", "yellow", "purple")
counts <- table(ranges)
labels <- c("0-1k", "1k-5k", "5k-10k", "10k-20k", "20k-50k")
barplot(
height = counts,
names.arg = labels,
main = "Distribution of Undergraduate Enrollment",
xlab = "Enrollment Range",
ylab = "Number of Colleges",
col = colors,
)
This barplot shows us that majority of colleges have a undergraduate
enrollment between 0 and 5,000.