This report investigates various descriptive aspects of the “CollegeScores4yr” data set. Using Chapter 6 methods (including mean, median, variance, standard deviation, correlation, histograms, box plots, bar plots, and pie charts), we address ten focused research questions. These questions were developed in two ways: first, by independent thinking and second, by consulting ChatGPT. The final set of questions covers diverse perspectives while limiting the analysis to one or two variables at a time.
I propose the following questions based on my own understanding of the data.
What is the mean average debt for students who complete the program?
Do universities with higher cost have a higher completion rate?
Are ACT/SAT scores high on average at public or private universities?
What is the correlation between net price and enrollment?
How are the percentage of students receiving Pell grants distributed?
What is the composition of schools based on control?
Are there differences in instructional spending per FTE based on whether it is a main campus or a branch campus?
Do different university types (public, private, profit) have differences in faculty salary?
Does student population affect completion rate?
How does school control affect admission rates?
An overview of how I will explore each question:
## [1] 2365.655
## [1] 0.5870019
## # A tibble: 3 × 3
## Control AvgACT AvgSAT
## <chr> <dbl> <dbl>
## 1 Private 23.8 1146.
## 2 Profit 23.4 1124.
## 3 Public 23.0 1119.
## [1] -0.1392744
## [1] 0.4284533
## # A tibble: 2 × 2
## Main AvgInstructFTE
## <int> <dbl>
## 1 0 5803.
## 2 1 11418.
## # A tibble: 3 × 2
## Control AvgFacSalary
## <chr> <dbl>
## 1 Private 7091.
## 2 Profit 6234.
## 3 Public 8520.
## [1] 0.1678195
The mean average debt for students who complete the program is $2,365.66. This indicates that, on average, students who graduate from these institutions accumulate approximately $2,365 in debt.
The correlation between university cost and completion rate is 0.587, indicating a moderate positive relationship. This suggests that universities with higher costs tend to have higher completion rates, though other factors may also influence this trend.
On average, students at private universities tend to have slightly higher ACT/SAT scores compared to those at public and for-profit institutions.
The correlation between average net price and enrollment is -0.139, indicating a very weak negative relationship. This suggests that as the net price of a university increases, enrollment tends to decrease slightly, but the relationship is not strong. Other factors, such as financial aid, school reputation, and program availability, may also influence enrollment numbers.
The correlation between in-state tuition and instructional spending per FTE student is 0.428, indicating a moderate positive relationship. This suggests that schools with higher in-state tuition tend to allocate more resources toward instructional spending per student, but the relationship is not very strong. Other factors, such as endowments, state funding, and institutional priorities, may also impact instructional spending levels.
This distribution suggests that most institutions in the data set are private, with fewer public and even fewer for-profit schools.
Main campuses spend an average of $11,418 per FTE student, which is nearly double the spending at branch campuses.Branch campuses allocate an average of $5,803 per FTE student, indicating a lower investment in instructional resources.This suggests that main campuses tend to have higher instructional spending per student compared to branch campuses, which may reflect differences in funding, resources, or institutional priorities.
This suggests that faculty at public universities tend to earn the most, while those at for-profit institutions receive the lowest salaries. The difference could be due to funding sources, institutional priorities, or differences in faculty qualifications and workload.
The correlation between student population and completion rate is 0.168, indicating a weak positive relationship between the two variables. Since the correlation is close to 0, there is little to no strong relationship between enrollment size and completion rate.
This distribution suggests that private institutions are the most selective with their admission rates followed by public and profit.
When interpreting data, it is crucial to distinguish between correlation and causation. For instance, as discussed in question 5, there is a noticeable trend that colleges with a higher proportion of first-generation students tend to have lower completion rates. Some might infer that first-generation students are less motivated to complete college compared to their peers. However, we lack sufficient information to support this claim. The only conclusions we can draw are those directly supported by the calculations and visual representations of the data.
mean(college$Debt, na.rm=TRUE)
## [1] 2365.655
# Calculate the correlation between Cost and Completion Rate
cor(college$Cost, college$CompRate, use = "complete.obs")
## [1] 0.5870019
# Calculate the average ACT and SAT scores by school control
testScores <- college %>%
group_by(Control) %>%
summarize(
AvgACT = mean(MidACT, na.rm = TRUE),
AvgSAT = mean(AvgSAT, na.rm = TRUE)
)
# Display the result
testScores
## # A tibble: 3 × 3
## Control AvgACT AvgSAT
## <chr> <dbl> <dbl>
## 1 Private 23.8 1146.
## 2 Profit 23.4 1124.
## 3 Public 23.0 1119.
cor(college$NetPrice, college$Enrollment, use = "complete.obs")
## [1] -0.1392744
cor(college$TuitionIn, college$InstructFTE, use = "complete.obs")
## [1] 0.4284533
table(college$Control)
##
## Private Profit Public
## 1243 170 599
pie(control_summary,
main = "Composition of Schools Based on Control",
col = rainbow(length(control_summary)),
labels = names(control_summary))
spendingByCampus <- college %>%
group_by(Main) %>%
summarize(AvgInstructFTE = mean(InstructFTE, na.rm = TRUE))
spendingByCampus
## # A tibble: 2 × 2
## Main AvgInstructFTE
## <int> <dbl>
## 1 0 5803.
## 2 1 11418.
# Compare average faculty salary between public and private universities
salaryByControl <- college %>%
group_by(Control) %>%
summarize(AvgFacSalary = mean(FacSalary, na.rm = TRUE))
salaryByControl
## # A tibble: 3 × 2
## Control AvgFacSalary
## <chr> <dbl>
## 1 Private 7091.
## 2 Profit 6234.
## 3 Public 8520.
collegePublic <- subset(college, Control == "Public")
collegePrivate <- subset(college, Control == "Private")
collegeProfit <- subset(college, Control == "Profit")
boxplot(FacSalary ~ Control,
data = collegePublic,
main = "Faculty Monthly Salary for Public Universities",
col = "orange", # Color for the public universities
ylab = "Faculty Monthly Salary",
horizontal = FALSE)
boxplot(FacSalary ~ Control,
data = collegePrivate,
main = "Faculty Monthly Salary for Private Universities",
col = "red", # Color for the private universities
ylab = "Faculty Monthly Salary",
horizontal = FALSE)
boxplot(FacSalary ~ Control,
data = collegeProfit,
main = "Faculty Monthly Salary for Profit Universities",
col = "blue", # Color for the profit universities
ylab = "Faculty Monthly Salary",
horizontal = FALSE)
cor(college$Enrollment, college$CompRate, use = "complete.obs")
## [1] 0.1678195
mean_admission_rate <- tapply(college$AdmitRate, college$Control, mean, na.rm = TRUE)