Introduction

This report investigates various descriptive aspects of the “CollegeScores4yr” data set. Using Chapter 6 methods (including mean, median, variance, standard deviation, correlation, histograms, box plots, bar plots, and pie charts), we address ten focused research questions. These questions were developed in two ways: first, by independent thinking and second, by consulting ChatGPT. The final set of questions covers diverse perspectives while limiting the analysis to one or two variables at a time.

I propose the following questions based on my own understanding of the data.

  1. What is the mean average debt for students who complete the program?

  2. Do universities with higher cost have a higher completion rate?

  3. Are ACT/SAT scores high on average at public or private universities?

  4. What is the correlation between net price and enrollment?

  5. How are the percentage of students receiving Pell grants distributed?

  6. What is the composition of schools based on control?

  7. Are there differences in instructional spending per FTE based on whether it is a main campus or a branch campus?

  8. Do different university types (public, private, profit) have differences in faculty salary?

  9. Does student population affect completion rate?

  10. How does school control affect admission rates?

Methodology

An overview of how I will explore each question:

  1. What is the mean average debt for students who complete the program?
  1. Do universities with higher cost have a higher completion rate?
  1. Are ACT/SAT scores higher on average at public or private universities?
  1. What is the correlation between net price and enrollment?
  1. Do schools with higher in-state tuition tend to have higher instructional spending per FTE student?
  1. What is the composition of schools based on control?
  1. Are there differences in instructional spending per FTE based on whether it is a main campus or a branch campus?
  1. Do different university types (public, private, profit) have differences in faculty salary?
  1. Does student population affect completion rate?
  1. How does school control affect admission rates?

Analysis

Q1: What is the mean average debt for students who complete the program?

## [1] 2365.655

Q2: Do universities with higher cost have a higher completion rate?

## [1] 0.5870019

Q3: Are ACT/SAT scores high on average at public or private universities?

## # A tibble: 3 × 3
##   Control AvgACT AvgSAT
##   <chr>    <dbl>  <dbl>
## 1 Private   23.8  1146.
## 2 Profit    23.4  1124.
## 3 Public    23.0  1119.

Q4: What is the correlation between net price and enrollment?

## [1] -0.1392744

Q5: Do schools with higher in-state tuition tend to have higher instructional spending per FTE student?

## [1] 0.4284533

Q6: What is the composition of schools based on control?

Q7: Are there differences in instructional per FTE spending based on whether it is a main campus or a branch campus?

## # A tibble: 2 × 2
##    Main AvgInstructFTE
##   <int>          <dbl>
## 1     0          5803.
## 2     1         11418.

Q8: Do different university types (public, private, profit) have differences in faculty salary?

## # A tibble: 3 × 2
##   Control AvgFacSalary
##   <chr>          <dbl>
## 1 Private        7091.
## 2 Profit         6234.
## 3 Public         8520.

Q9: Does student population affect completion rate?

## [1] 0.1678195

Q10: How does school control affect admission rates?

Conclusion

  1. The mean average debt for students who complete the program is $2,365.66. This indicates that, on average, students who graduate from these institutions accumulate approximately $2,365 in debt.

  2. The correlation between university cost and completion rate is 0.587, indicating a moderate positive relationship. This suggests that universities with higher costs tend to have higher completion rates, though other factors may also influence this trend.

  3. On average, students at private universities tend to have slightly higher ACT/SAT scores compared to those at public and for-profit institutions.

  4. The correlation between average net price and enrollment is -0.139, indicating a very weak negative relationship. This suggests that as the net price of a university increases, enrollment tends to decrease slightly, but the relationship is not strong. Other factors, such as financial aid, school reputation, and program availability, may also influence enrollment numbers.

  5. The correlation between in-state tuition and instructional spending per FTE student is 0.428, indicating a moderate positive relationship. This suggests that schools with higher in-state tuition tend to allocate more resources toward instructional spending per student, but the relationship is not very strong. Other factors, such as endowments, state funding, and institutional priorities, may also impact instructional spending levels.

  6. This distribution suggests that most institutions in the data set are private, with fewer public and even fewer for-profit schools.

  7. Main campuses spend an average of $11,418 per FTE student, which is nearly double the spending at branch campuses.Branch campuses allocate an average of $5,803 per FTE student, indicating a lower investment in instructional resources.This suggests that main campuses tend to have higher instructional spending per student compared to branch campuses, which may reflect differences in funding, resources, or institutional priorities.

  8. This suggests that faculty at public universities tend to earn the most, while those at for-profit institutions receive the lowest salaries. The difference could be due to funding sources, institutional priorities, or differences in faculty qualifications and workload.

  9. The correlation between student population and completion rate is 0.168, indicating a weak positive relationship between the two variables. Since the correlation is close to 0, there is little to no strong relationship between enrollment size and completion rate.

  10. This distribution suggests that private institutions are the most selective with their admission rates followed by public and profit.

When interpreting data, it is crucial to distinguish between correlation and causation. For instance, as discussed in question 5, there is a noticeable trend that colleges with a higher proportion of first-generation students tend to have lower completion rates. Some might infer that first-generation students are less motivated to complete college compared to their peers. However, we lack sufficient information to support this claim. The only conclusions we can draw are those directly supported by the calculations and visual representations of the data.

Appendix

Q1: What is the mean average debt for students who complete the program?

mean(college$Debt, na.rm=TRUE)
## [1] 2365.655

Q2: Do universities with higher cost have a higher completion rate?

# Calculate the correlation between Cost and Completion Rate
cor(college$Cost, college$CompRate, use = "complete.obs")
## [1] 0.5870019

Q3: Are ACT/SAT scores high on average at public or private universities?

# Calculate the average ACT and SAT scores by school control
testScores <- college %>%
  group_by(Control) %>%
  summarize(
    AvgACT = mean(MidACT, na.rm = TRUE),
    AvgSAT = mean(AvgSAT, na.rm = TRUE)
  )

# Display the result
testScores
## # A tibble: 3 × 3
##   Control AvgACT AvgSAT
##   <chr>    <dbl>  <dbl>
## 1 Private   23.8  1146.
## 2 Profit    23.4  1124.
## 3 Public    23.0  1119.

Q4: What is the correlation between net price and enrollment?

cor(college$NetPrice, college$Enrollment, use = "complete.obs")
## [1] -0.1392744

Q5: Do schools with higher in-state tuition tend to have higher instructional spending per FTE student?

cor(college$TuitionIn, college$InstructFTE, use = "complete.obs")
## [1] 0.4284533

Q6: What is the composition of schools based on control?

table(college$Control)
## 
## Private  Profit  Public 
##    1243     170     599
pie(control_summary, 
    main = "Composition of Schools Based on Control",
    col = rainbow(length(control_summary)),
    labels = names(control_summary))

Q7: Are there differences in instructional per FTE spending based on whether it is a main campus or a branch campus?

spendingByCampus <- college %>%
  group_by(Main) %>%
  summarize(AvgInstructFTE = mean(InstructFTE, na.rm = TRUE))

spendingByCampus
## # A tibble: 2 × 2
##    Main AvgInstructFTE
##   <int>          <dbl>
## 1     0          5803.
## 2     1         11418.

Q8: Do different university types (public, private, profit) have differences in faculty salary?

# Compare average faculty salary between public and private universities
salaryByControl <- college %>%
  group_by(Control) %>%
  summarize(AvgFacSalary = mean(FacSalary, na.rm = TRUE))

salaryByControl
## # A tibble: 3 × 2
##   Control AvgFacSalary
##   <chr>          <dbl>
## 1 Private        7091.
## 2 Profit         6234.
## 3 Public         8520.
collegePublic <- subset(college, Control == "Public")
collegePrivate <- subset(college, Control == "Private")
collegeProfit <- subset(college, Control == "Profit")

boxplot(FacSalary ~ Control, 
        data = collegePublic, 
        main = "Faculty Monthly Salary for Public Universities",
        col = "orange", # Color for the public universities
         
        ylab = "Faculty Monthly Salary", 
        horizontal = FALSE)

boxplot(FacSalary ~ Control, 
        data = collegePrivate, 
        main = "Faculty Monthly Salary for Private Universities",
        col = "red", # Color for the private universities
         
        ylab = "Faculty Monthly Salary", 
        horizontal = FALSE)

boxplot(FacSalary ~ Control, 
        data = collegeProfit, 
        main = "Faculty Monthly Salary for Profit Universities",
        col = "blue", # Color for the profit universities
         
        ylab = "Faculty Monthly Salary", 
        horizontal = FALSE)

Q9: Does student population affect completion rate?

cor(college$Enrollment, college$CompRate, use = "complete.obs")
## [1] 0.1678195

Q10: How does school control affect admission rates?

mean_admission_rate <- tapply(college$AdmitRate, college$Control, mean, na.rm = TRUE)