We use the data from the webpage Lock5Stat COllege Data to explore various aspects of US colleges. This analysis addresses ten key questions, focusing on metrics such as enrollment, SAT scores, admission rates, faculty salaries, and more. The questions we explore include:
I propose the following 10 questions based on my own understanding of the data.
We will explore the questions in detail.
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
## [1] 4484.831
The mean enrollment across all colleges is 4484.831.
## [1] 1121
The median of average SAT scores for all colleges is 1121.
## [1] 0.04333848
The variance of admission rates is 0.04333848.
## [1] 11.08522
The standard deviation of the percentage of first generation students is 11.08522.
## [1] -0.1091143
The correlation between net price and average debt is -0.1091143.
The distribution of in-state tuition fees is right-skewed, with a higher frequency of colleges having lower tuition costs.
Faculty salaries vary by control type, with private colleges typically paying the highest salaries.
The percentage distribution of schools across different US regions vary with Northeast having the highest of 27.4% and Territory having the lowest at 2.4%.
Admission rates vary across different regions with Midwest being the highest and Southeast having the lowest.
## [1] 0.0 97.6
## [1] 97.6
The range of instructional spending per FTE student is 97.6 between spendings 0 and 97.6.
This project analyzed various aspects of college data using R. We explored key questions related to enrollment, SAT scores, tuition fees, faculty salaries, financial aid, and admission rates across different regions and institution types.
Key findings include:
# Load data
# college <- read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
# Q1: Mean enrollment
# mean(college$Enrollment, na.rm = TRUE)
# Q2: Median SAT scores
# college$AvgSAT <- as.numeric(as.character(college$AvgSAT))
# median(college$AvgSAT, na.rm = TRUE)
# Q3: Variance of admission rates
# var(college$AdmitRate, na.rm = TRUE)
# Q4: Standard deviation of first-generation students
# sd(college$FirstGen, na.rm = TRUE)
# Q5: Correlation between net price and average debt
# cor(college$NetPrice, college$Debt, use = "complete.obs")
# Q6: Distribution of in-state tuition fees
# hist(college$TuitionIn, main = "Distribution of In-State Tuition Fees",
# xlab = "In-State Tuition Fees", ylab = "Frequency",
# col = "lightblue", border = "black", breaks = 20)
# Q7: Faculty salaries across control types
# boxplot(college$FacSalary ~ college$Control)
# Q8: Percentage distribution of schools across regions
# region_counts <- table(college$Region)
# region_percentages <- prop.table(region_counts) * 100
# pie(region_percentages)
# Q9: Admission rates across regions
# region_admit_rate <- tapply(college$AdmitRate, college$Region, mean, na.rm = TRUE)
# barplot(region_admit_rate)
# Q10: Range of Pell grant percentage
# range_pell <- range(college$Pell, na.rm = TRUE)
# range_pell[2] - range_pell[1]