Introduction

In this project, I’ll be working with the CollegeScores4yr dataset from Lock5Stat to explore various factors that may influence college outcomes. This dataset provides a range of variables related to four-year colleges in the U.S., covering topics like graduation rates, cost of attendance, student demographics, and financial aid.

knitr::opts_chunk$set(echo = FALSE)
# Load data from the Excel file
data <- read_excel("CollegeScores4yr.xlsx")

Part 1: My Initial Questions

As I reviewed the variable descriptions in the CollegeScores4yr dataset, several questions came to mind. Here are the questions I came up with:

  1. What’s the average graduation rate across all colleges?
  2. How does the cost of attendance vary by state?
  3. Do public and private colleges have different graduation rates?
  4. Is there a relationship between median family income and average SAT scores?
  5. Are higher student-to-faculty ratios associated with lower graduation rates?
  6. Does the acceptance rate correlate with SAT scores?
  7. How does financial aid vary between colleges with high and low graduation rates?
  8. Are there regional trends in graduation rates?
  9. What percentage of colleges have a graduation rate above 80%?
  10. How does the percentage of underrepresented minority students vary by state?

Part 2: Questions Suggested by ChatGPT

ChatGPT provided an alternative set of questions focusing on statistical relationships and broader patterns:

  1. What is the correlation between cost of attendance and graduation rate?
  2. Does a higher student-to-faculty ratio impact the SAT score averages?
  3. How does graduation rate differ by college type (public vs. private)?
  4. What is the average acceptance rate across all colleges?
  5. Is there a relationship between college size and average cost of attendance?
  6. How does median family income relate to student debt upon graduation?
  7. Do colleges with higher average SAT scores have higher graduation rates?
  8. What is the impact of location (e.g., urban vs. rural) on graduation rate?
  9. How do graduation rates vary by the level of financial aid provided?
  10. Is there a significant relationship between the percentage of minority students and graduation rate?

Part 3: Comparing Similar Questions

After comparing both sets, I noticed overlap in areas like: 1. Graduation rates by college type (public vs. private) 2. The impact of financial aid on graduation rates 3. Student-to-faculty ratios and outcomes 4. Cost-related factors like family income and attendance costs

Part 4: Final Selection of Ten Questions for Analysis

After reviewing both sets, here are the final ten questions selected for in-depth analysis:

  1. What’s the average graduation rate across all colleges?
  2. How do graduation rates differ between public and private institutions?
  3. Is there a connection between median family income and SAT scores?
  4. How does attendance cost vary by college size?
  5. Is there a significant correlation between attendance costs and graduation rates?
  6. Do graduation rates correlate with the percentage of minority students?
  7. How does the student-to-faculty ratio impact SAT scores?
  8. What percentage of colleges have graduation rates above 80%?
  9. Is there a relationship between acceptance rates and SAT scores?
  10. How does the level of financial aid impact graduation rates?

Data Collection and Cleaning

##        Name       State          ID        Main      Accred  MainDegree 
##           0           0           0           0           0           0 
##  HighDegree     Control      Region      Locale    Latitude   Longitude 
##           0           0           0           0           0           0 
##   AdmitRate      MidACT      AvgSAT      Online  Enrollment       White 
##         360         760         735           0           1           1 
##       Black    Hispanic       Asian       Other    PartTime    NetPrice 
##           1           1           1           1           1         162 
##        Cost   TuitionIn   TuitonOut  TuitionFTE InstructFTE   FacSalary 
##         162          94          94           2           2          54 
## FullTimeFac        Pell    CompRate        Debt      Female    FirstGen 
##         127           5         167         152         166         225 
##   MedIncome 
##          51
## [1] 1256   37

Outlier Analysis

Outliers can reveal unique cases or unusual data points. Here, we’ll identify outliers in some key variables like “Cost” and “MedIncome” to understand the range and detect any extreme values.

The boxplots highlight any extreme values in college costs and median income. Outliers in these variables could suggest certain high-cost institutions or regions with very high or low family incomes.

Descriptive Statistics

  1. Average Graduation Rate
## [1] 56.5987

The average graduation rate provides a baseline for comparison. A higher average might indicate overall success in supporting students to graduate.

  1. Graduation Rates by Institution Type
## # A tibble: 3 × 2
##   Control average_graduation_rate
##   <chr>                     <dbl>
## 1 Private                    59.4
## 2 Profit                     50.0
## 3 Public                     52.2

Comparing graduation rates by type (public vs. private) reveals if institutional control influences success rates, possibly due to factors like funding or student demographics.

  1. Median Family Income vs. SAT Scores
## [1] 0.5830873

A positive correlation here might indicate socioeconomic factors influencing SAT scores, with students from higher-income families potentially achieving better results.

  1. Attendance Cost by College Size
## # A tibble: 1,151 × 2
##    Enrollment avg_cost
##         <dbl>    <dbl>
##  1         25   14795 
##  2         59     NaN 
##  3         63   24909 
##  4         67   35522 
##  5         75   17415 
##  6         77   41282 
##  7         78   18863 
##  8         90   35504.
##  9         98   67350 
## 10        100   28994 
## # ℹ 1,141 more rows

The average attendance cost by enrollment size could reveal if larger or smaller colleges tend to have higher or lower costs, which may impact affordability and access.

  1. Correlation Between Attendance Cost and Graduation Rate
## [1] 0.5823886

A significant positive or negative correlation could suggest that higher costs impact graduation rates, either as a barrier or a reflection of resources provided.

This plot shows if a higher minority percentage affects graduation rates, which could point to the effectiveness of inclusivity efforts.

  1. Student-to-Faculty Ratio Impact on SAT Scores
## [1] 0.6254544

A significant correlation might indicate that smaller class sizes or lower student-to-faculty ratios support better SAT outcomes, possibly due to personalized attention.

  1. Colleges with Graduation Rates Above 80%
## [1] 145

Identifying colleges with high graduation rates (above 80%) helps highlight successful institutions and establish benchmarks for excellence.

  1. Acceptance Rate vs. SAT Scores
## [1] -0.4201928

A negative correlation might suggest that more selective institutions attract higher-achieving students, as reflected in SAT scores.

  1. Financial Aid Impact on Graduation Rates
## [1] -0.6657039

This correlation may show if increased financial aid helps support students to graduate, highlighting the importance of accessible education funding.

Data Visualization

The histogram shows the spread of graduation rates across colleges, helping identify common success levels and outliers.

This scatter plot reveals the potential influence of family income on SAT scores, showing if wealthier areas correlate with higher scores.

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

A trend in this plot may indicate if larger or smaller institutions have higher attendance costs, potentially impacting student choices.

Interpretation and Conclusions

This analysis reveals key factors influencing college success, such as family income, financial aid availability, and institution type. These factors show strong connections to outcomes like graduation rates and SAT scores, suggesting that socioeconomic elements and institutional characteristics play important roles in student achievement. While the descriptive statistics and visualizations suggest correlations, they do not establish causation. Patterns related to minority representation and financial aid highlight broader systemic trends that likely affect student outcomes on a national scale, emphasizing the importance of inclusive and accessible support structures. This analysis provides a clearer picture of factors that influence student success in college. Future research could apply inferential statistics to deepen understanding and potentially identify causal relationships, offering more actionable insights for policy and educational planning.

Citations and References Lock5Stat. (2024). CollegeScores4yr dataset. Retrieved from https://www.lock5stat.com/datapage3e.html