These are some of the questios to explore. -1. What is the average ACT score across all colleges? -2. What is the median SAT score for colleges, and how does it vary across colleges? -3. What is the distribution of undergraduate enrollment across colleges? -4. What percentage of students identify as White, Black, Hispanic, Asian, or Other across colleges? -5. What is the average net price (NetPrice) for students across colleges, and how does it compare to the average total cost (Cost)? -6. Is there a difference in average tuition for in-state students (TuitionIn) versus out-of-state students (TuitonOut)? -7. What is the correlation between undergraduate enrollment (Enrollment) and average ACT scores (MidACT)? -8.What is the variance in average debt (Debt) for students who complete the program? -9. What is the average monthly salary for full-time faculty, and how does it compare to the percentage of full-time faculty? -10. What is the percentage of first-generation students across colleges, and how does it compare with the completion rate?
The details of the questions are discussed as followed:
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean_ACT <- mean(college$MidACT, na.rm = TRUE)
mean_ACT
## [1] 23.53514
The mean Avg ACT score is 23.53514.
median_SAT <- median(college$AvgSAT, na.rm = TRUE)
median_SAT
## [1] 1121
The meadian SAT socre for all colleges is 1121.
mean_enrollment <- mean(college$Enrollment, na.rm = TRUE)
median_enrollment <- median(college$Enrollment, na.rm = TRUE)
mean_enrollment
## [1] 4484.831
hist(college$Enrollment, main = "Distribution of Undergraduate Enrollment", xlab = "Enrollment", col = "lightcoral")
The enrollment across the college is 4484.831 The median of the undergraduate enrollment is 1722. The histogram diagram is shown in the figure.
ethnicity_means <- data.frame(
White = mean(college$White, na.rm = TRUE),
Black = mean(college$Black, na.rm = TRUE),
Hispanic = mean(college$Hispanic, na.rm = TRUE),
Asian = mean(college$Asian, na.rm = TRUE),
Other = mean(college$Other, na.rm = TRUE)
)
ethnicity_means
## White Black Hispanic Asian Other
## 1 55.10905 13.92342 13.10273 4.422476 13.46579
# Barplot of average ethnicity percentages
barplot(as.numeric(ethnicity_means), names.arg = colnames(ethnicity_means), main = "Average Ethnicity Distribution", col = rainbow(5))
The Average ethnicity districution is shown in the figure. white: 55.10905 black: 13.92342 hispanic: 13.10273 asian: 4.422476 other: 13.46579
mean_net_price <- mean(college$NetPrice, na.rm = TRUE)
mean_cost <- mean(college$Cost, na.rm = TRUE)
mean_net_price
## [1] 19886.82
mean_cost
## [1] 34277.31
# Boxplots to compare NetPrice and Cost
boxplot(college$NetPrice, college$Cost, names = c("Net Price", "Total Cost"), main = "Comparison of Net Price and Total Cost", col = c("lightblue", "lightgreen"))
The average net price for students is 19886.82 The average total cost for students is 34277.21 the difference is 14390.39
mean_in_state <- mean(college$TuitionIn, na.rm = TRUE)
mean_out_state <- mean(college$TuitonOut, na.rm = TRUE)
mean_in_state
## [1] 21948.55
mean_out_state
## [1] 25336.66
# Boxplot comparison
boxplot(college$TuitionIn, college$TuitonOut, names = c("In-State", "Out-of-State"), main = "In-State vs Out-of-State Tuition", col = c("lightblue", "lightpink"))
look like there is a difference between both the parties. The difference is: 2288.11
cor_enrollment_ACT <- cor(college$Enrollment, college$MidACT, use = "complete.obs")
cor_enrollment_ACT
## [1] 0.2572878
# Scatter plot to visualize the relationship
plot(college$Enrollment, college$MidACT, main = "Enrollment vs ACT Scores", xlab = "Enrollment", ylab = "ACT Score", col = "darkblue", pch = 16)
The co-relation between undergraduate enrollment and average ACT Score is: 0.2572
variance_debt <- var(college$Debt, na.rm = TRUE)
variance_debt
## [1] 28740171
# Histogram of debt
hist(college$Debt, main = "Distribution of Average Student Debt", xlab = "Debt ($)", col = "purple")
From the Data the Average Debt Turns out to be 28740171.
mean_fac_salary <- mean(college$FacSalary, na.rm = TRUE)
mean_full_time_fac <- mean(college$FullTimeFac, na.rm = TRUE)
mean_fac_salary
## [1] 7465.778
mean_full_time_fac
## [1] 64.8313
# Scatter plot to examine the relationship
plot(college$FullTimeFac, college$FacSalary, main = "Full-Time Faculty Percentage vs Average Salary", xlab = "Percentage of Full-Time Faculty", ylab = "Average Faculty Salary ($)", col = "darkgreen", pch = 16)
Looks like the Avg monthly gross salary of a full time faculty is: 7465.778
cor_firstgen_comprate <- cor(college$FirstGen, college$CompRate, use = "complete.obs")
cor_firstgen_comprate
## [1] -0.6643909
# Scatter plot to visualize the relationship
plot(college$FirstGen, college$CompRate, main = "First-Generation Students vs Completion Rate", xlab = "Percentage of First-Generation Students", ylab = "Completion Rate (%)", col = "orange", pch = 16)
The co-relation is -0.6643909.
In my analysis, I discovered several key patterns in college characteristics:
Enrollment vs. ACT Scores: I found that higher enrollment didn’t necessarily mean higher ACT scores, showing that larger student populations don’t always correlate with better academic performance.
Tuition Costs: I observed a noticeable difference in tuition costs between in-state and out-of-state students, which reflects the financial burden faced by those attending from out of state.
Ethnicity Distribution: The variation in ethnicity distribution across colleges stood out to me, pointing to differences in campus diversity and how this can shape the student experience.
Faculty Salaries: I noticed that the average faculty salary seemed linked to the percentage of full-time faculty, which suggests possible implications for recruitment and retention strategies.
Overall, this project allowed me to apply descriptive statistical methods to uncover insights into the educational landscape. These findings have laid the foundation for further analysis, such as exploring how college characteristics influence student success or comparing this dataset with others to assess trends over time.