These are some of the questios to explore. -1. What is the average ACT score across all colleges? -2. What is the median SAT score for colleges, and how does it vary across colleges? -3. What is the distribution of undergraduate enrollment across colleges? -4. What percentage of students identify as White, Black, Hispanic, Asian, or Other across colleges? -5. What is the average net price (NetPrice) for students across colleges, and how does it compare to the average total cost (Cost)? -6. Is there a difference in average tuition for in-state students (TuitionIn) versus out-of-state students (TuitonOut)? -7. What is the correlation between undergraduate enrollment (Enrollment) and average ACT scores (MidACT)? -8.What is the variance in average debt (Debt) for students who complete the program? -9. What is the average monthly salary for full-time faculty, and how does it compare to the percentage of full-time faculty? -10. What is the percentage of first-generation students across colleges, and how does it compare with the completion rate?
The details of the questions are discussed here.
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean_ACT <- mean(college$MidACT, na.rm = TRUE)
mean_ACT
## [1] 23.53514
The mean Avg ACT score is 23.53514.
median_SAT <- median(college$AvgSAT, na.rm = TRUE)
median_SAT
## [1] 1121
The meadian SAT socre for all colleges is 1121.
mean_enrollment <- mean(college$Enrollment, na.rm = TRUE)
median_enrollment <- median(college$Enrollment, na.rm = TRUE)
mean_enrollment
## [1] 4484.831
median_enrollment
## [1] 1722
hist(college$Enrollment, main = "Distribution of Undergraduate Enrollment", xlab = "Enrollment", col = "lightcoral")
The mean of the undergraduate enrollment across the college is 4484.831
The median of the undergraduate enrollment is 1722. The histogram
diagram is shown in the figure.
ethnicity_means <- data.frame(
White = mean(college$White, na.rm = TRUE),
Black = mean(college$Black, na.rm = TRUE),
Hispanic = mean(college$Hispanic, na.rm = TRUE),
Asian = mean(college$Asian, na.rm = TRUE),
Other = mean(college$Other, na.rm = TRUE)
)
ethnicity_means
## White Black Hispanic Asian Other
## 1 55.10905 13.92342 13.10273 4.422476 13.46579
# Barplot of average ethnicity percentages
barplot(as.numeric(ethnicity_means), names.arg = colnames(ethnicity_means), main = "Average Ethnicity Distribution", col = rainbow(5))
The Average ethnicity districution is shown in the figure. white: 55.10905 black: 13.92342 hispanic: 13.10273 asian: 4.422476 other: 13.46579
mean_net_price <- mean(college$NetPrice, na.rm = TRUE)
mean_cost <- mean(college$Cost, na.rm = TRUE)
mean_net_price
## [1] 19886.82
mean_cost
## [1] 34277.31
# Boxplots to compare NetPrice and Cost
boxplot(college$NetPrice, college$Cost, names = c("Net Price", "Total Cost"), main = "Comparison of Net Price and Total Cost", col = c("lightblue", "lightgreen"))
The average net price for students is 19886.82 The average total cost for students is 34277.21 the difference is 14390.39
mean_in_state <- mean(college$TuitionIn, na.rm = TRUE)
mean_out_state <- mean(college$TuitonOut, na.rm = TRUE)
mean_in_state
## [1] 21948.55
mean_out_state
## [1] 25336.66
# Boxplot comparison
boxplot(college$TuitionIn, college$TuitonOut, names = c("In-State", "Out-of-State"), main = "In-State vs Out-of-State Tuition", col = c("lightblue", "lightpink"))
Yes, there is a difference in average tuition for in-state vs out-state
students. The difference is: 2288.11
cor_enrollment_ACT <- cor(college$Enrollment, college$MidACT, use = "complete.obs")
cor_enrollment_ACT
## [1] 0.2572878
# Scatter plot to visualize the relationship
plot(college$Enrollment, college$MidACT, main = "Enrollment vs ACT Scores", xlab = "Enrollment", ylab = "ACT Score", col = "darkblue", pch = 16)
The co-relation between undergraduate enrollment and average ACT Score is: 0.2572
variance_debt <- var(college$Debt, na.rm = TRUE)
variance_debt
## [1] 28740171
# Histogram of debt
hist(college$Debt, main = "Distribution of Average Student Debt", xlab = "Debt ($)", col = "purple")
The variance in average debt for the students who complete the program
is: 28740171.
mean_fac_salary <- mean(college$FacSalary, na.rm = TRUE)
mean_full_time_fac <- mean(college$FullTimeFac, na.rm = TRUE)
mean_fac_salary
## [1] 7465.778
mean_full_time_fac
## [1] 64.8313
# Scatter plot to examine the relationship
plot(college$FullTimeFac, college$FacSalary, main = "Full-Time Faculty Percentage vs Average Salary", xlab = "Percentage of Full-Time Faculty", ylab = "Average Faculty Salary ($)", col = "darkgreen", pch = 16)
The average monthly salary of the full time faculty is: 7465.778
cor_firstgen_comprate <- cor(college$FirstGen, college$CompRate, use = "complete.obs")
cor_firstgen_comprate
## [1] -0.6643909
# Scatter plot to visualize the relationship
plot(college$FirstGen, college$CompRate, main = "First-Generation Students vs Completion Rate", xlab = "Percentage of First-Generation Students", ylab = "Completion Rate (%)", col = "orange", pch = 16)
The co-relation is -0.6643909.
The summary of results showed key patterns in college characteristics, including:
A general trend where higher enrollment did not necessarily correlate with higher ACT scores. Distinct differences in tuition costs between in-state and out-of-state students, reflecting varied financial burdens. A notable variation in ethnicity distribution across colleges, pointing to differences in campus diversity. The average faculty salary being related to the percentage of full-time faculty, suggesting potential implications for faculty recruitment and retention. Overall, this project successfully applied descriptive statistical methods to uncover insights into the educational landscape, shedding light on factors that affect both student outcomes and institutional characteristics. These findings provide a foundation for further analyses that could include exploring relationships between college attributes and student success metrics or comparing this dataset with similar educational data to assess trends over time.