We use the data from CollegScores4yr to complete our data analysis.
I propose the following 10 questions based on my own understanding of data.
We will explore the questions in detail.
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean_tuition <- mean(college$TuitionIn, na.rm = TRUE)
cat("1. Mean in-state tuition:", mean_tuition, "\n")
## 1. Mean in-state tuition: 21948.55
The mean in-state tuition is $21,948.55.
median_tuition <- median(college$TuitionIn, na.rm = TRUE)
cat("2. Median in-state tuition:", median_tuition, "\n")
## 2. Median in-state tuition: 17662
The median in-state tuition is $17,662.
sd_debt <- sd(college$Debt, na.rm = TRUE)
cat("3. Standard deviation of student debt:", sd_debt, "\n")
## 3. Standard deviation of student debt: 5360.986
The standard deviation of student debt os $5,360.986
hist(college$CompRate,
main = "Histogram of Completion Rate",
xlab = "Completion Rate",
col = "lightblue",
border = "black")
A large majority of schools have completion rates between 40 and 60%
boxplot(college$AvgSAT,
main = "Boxplot of Average SAT Scores",
ylab = "Average SAT Score",
col = "lightgreen")
The average SAT scores are between 1100 and 1200.
barplot(table(college$Region),
main = "Number of Colleges by Region",
ylab = "Number of Colleges",
col = "lightcoral")
The Northeast has the most amount of colleges in a given region.
control_counts <- table(college$Control)
print(control_counts)
##
## Private Profit Public
## 1243 170 599
pie(control_counts,
main = "Ratio of Public vs Private Schools",
col = c("lightblue", "lightpink"),
labels = paste(names(control_counts), "\n", control_counts))
There are over 2 times more private schools than public schools.
cor_sat_comp <- cor(college$AvgSAT, college$CompRate, use = "complete.obs")
cat("8. Correlation between AvgSAT and Completion Rate:", cor_sat_comp, "\n")
## 8. Correlation between AvgSAT and Completion Rate: 0.8189495
There is a decently strong correlation between SAT scores and completion rate with 0.82
mean_admit <- mean(college$AdmitRate, na.rm = TRUE)
cat("9. Average admission rate:", mean_admit, "\n")
## 9. Average admission rate: 0.6702025
The average admissions rate for colleges is 67%
median_income <- median(college$MedIncome, na.rm = TRUE)
cat("10. Median family income:", median_income, "\n")
## 10. Median family income: 42.6
The median family income is $42,600
After our data analysis, we have figured out answers to all of our questions surrounding the data.
# Q1 code
# Q2 code
# Q3 code
# Q4 code
# Q5 code
# Q6 code
# Q7 code
# Q8 code
# Q9 code
# Q10 code