1.Introduction

This report investigates ten data-driven questions about colleges using the dataset ‘CollegeScores4yr.csv’ . The focus is on applying descriptive statistical tools such as mean, median, variance, standard deviation, correlation, boxplots, and histograms.

My personal questions:

1. What is the average faculty salary at colleges with strong engineering programs?
2. What is the median graduation rate for public vs. private colleges?
3. What’s the variance in student enrollment across different college types?
4. Is there a correlation between average student debt and median earnings after graduation?
5. What’s the average net price for in-state students compared to out-of-state?
6. What is the standard deviation of SAT scores among selective colleges?
7. Do rural colleges tend to have lower admission rates than urban ones?
8. What is the distribution of student-to-faculty ratios across U.S. colleges?
9. What’s the average percent of first-generation college students at community colleges?
10.How does average tuition compare between HBCUs and non-HBCUs?

ChatGPT questions:

1. What is the mean net price across all colleges?
2. What is the median admission rate by region?
3. What is the standard deviation of average faculty salary?
4. What is the variance in undergraduate enrollment?
5. What is the correlation between average SAT scores and admission rates?
6. How does tuition (in-state vs out-of-state) compare using a boxplot?
7. What is the distribution (histogram) of completion rates across colleges?
8. What is the average percent of female students by region?
9. What is the correlation between student loan debt and net price?
10.Use a barplot to compare median ACT scores across different regions.

Final Questions:

1.  What is the mean net price across all colleges?
2.  What is the median graduation rate for public vs. private colleges?
3.  What is the correlation between average SAT scores and admission rates?
4.  What is the variance in undergraduate enrollment?
5.  What is the average faculty salary at colleges with strong engineering programs?
6.  Use a boxplot to compare in-state vs out-of-state tuition.
7.  What is the median earnings after graduation in urban vs rural areas?
8.  What is the standard deviation of SAT scores among selective colleges?
9.  What is the average percent of female students by region?
10. Use a histogram to show the distribution of completion rates.

------------------------------------------------------------------------

2.Load and Preview Data

college <- read.csv("CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

3.Analysis

1.What is the mean net price across all colleges?

mean(college$NetPrice, na.rm = TRUE)
## [1] 19886.82

2. What is the median graduation rate for public vs. private colleges?

tapply(college$CompRate, college$Control, median, na.rm = TRUE)
## Private  Profit  Public 
##  56.240  26.390  48.575

3.What is the correlation between average SAT scores and admission rates?

cor(college$AvgSAT, college$AdmitRate, use = "complete.obs")
## [1] -0.4221255

4.What is the variance in undergraduate enrollment?

var(college$Enrollment, na.rm = TRUE)
## [1] 55846805

5. What is the average faculty salary at colleges with strong engineering programs?

quantile(college$AvgSAT, 0.75, na.rm = TRUE)  # Top 25% cutoff
##  75% 
## 1198
engineering <- subset(college, AvgSAT >= quantile(college$AvgSAT, 0.75, na.rm = TRUE) & MidACT >= 27)
mean(engineering$FacSalary, na.rm = TRUE)
## [1] 11327.41

6.Use a boxplot to compare in-state vs out-of-state tuition.

boxplot(college$TuitionIn, college$TuitonOut,
        names = c("In-State", "Out-of-State"),
        col = c("lightgreen", "lightblue"),
        main = "Tuition Comparison",
        ylab = "Tuition ($)")

7.What is the median earnings after graduation in urban vs rural areas?

tapply(college$MedIncome, college$Locale, median, na.rm = TRUE)
##   City  Rural Suburb   Town 
##   39.4   39.7   46.0   47.8

8.What is the standard deviation of SAT scores among selective colleges?

selective <- subset(college, AdmitRate < 0.5)
sd(selective$AvgSAT, na.rm = TRUE)
## [1] 197.1435

9.What is the average percent of female students by region?

tapply(college$Female, college$Region, mean, na.rm = TRUE)
##   Midwest Northeast Southeast Territory      West 
##  58.73613  59.44461  59.47673  56.61702  59.86353

10.Use a histogram to show the distribution of completion rates.

hist(college$CompRate, 
     main = "Histogram of Completion Rates", 
     xlab = "Completion Rate (%)", 
     col = "orange", 
     border = "white")

4. Summary & Conclusion

In this project, I conducted an exploratory data analysis on U.S. college data using various descriptive statistics and visualization techniques. The goal was to uncover insights into financial, demographic, and academic characteristics of higher education institutions.

Key findings include:

The average net price across all colleges is approximately $19,886, providing a baseline for college affordability.

Graduation rates differ between public and private institutions, with private schools showing slightly higher medians.

There is a moderate negative correlation between average SAT scores and admission rates, indicating that more selective colleges tend to require higher SAT scores.

The variance in enrollment is large, reflecting the diversity in college sizes—from small liberal arts colleges to large state universities.

Colleges with strong engineering indicators (high SAT/ACT) tend to offer higher average faculty salaries, possibly due to demand for specialized faculty.

A boxplot comparison of tuition shows that out-of-state tuition is consistently higher than in-state, with wider variation.

Urban colleges (Cities) report higher median earnings after graduation than those in rural settings, which may reflect location-based job opportunities.

Among selective colleges (admission rates < 50%), SAT scores show more consistency, with a lower standard deviation.

The percentage of female students is fairly consistent across regions, averaging around 58–60%.

The distribution of completion rates is right-skewed, indicating that while many colleges have high completion rates, some lag behind.

Overall, this analysis provided meaningful patterns and comparisons across a variety of college attributes. It demonstrates the power of descriptive statistics to summarize large datasets and uncover trends that can inform students, educators, and policymakers alike