1. Introduction

We use the data from College Scores 4yr

I propose the following 10 questions based on my own understanding of the data

2. Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1. What is the average undergraduate enrollment among the universities in the data?

mean(college$Enrollment, na.rm = TRUE)
## [1] 4484.831

The average enrollment among the universities in the data is 4,484.831 students.

Q2. How much variation exists in in-state tuition and fees among the universities in the data?

var(college$TuitionIn, use = "complete.obs")
## [1] 199665280

The variance of in-state tuition in the data is 199,665,280 (dollars²).

Q3. What is the average total cost of attendance across universities, and how much does it vary in the data?

hist(college$Cost, main = "Distribution of Total Cost of Attendance", xlab = "Total Cost ($)", col = "lightblue", breaks = 30)

The histogram of the data of total cost of attendance shows that most universities have total costs between 20,000 dollars and 22,000 dollars. The distribution is right-skewed, with a few schools having much higher costs above this range.

Q4. What does the distribution of median family income look like across universities in the data?

hist(college$MedIncome, main = "Histogram of Income", xlab = "Income")

The histogram of median family income in the data shows that most universities have a median family income between 20,000 dollars and 40,000 dollars.

Q5. Do universities with a higher percentage of white students tend to have a higher or lower percentage of female students in the data?

cor(college$White, college$Female, use = "complete.obs")
## [1] -0.09688832

From the data it shows that schools with more white students tend to have fewer female students.

Q6. Do universities with higher percentages of Pell-grant students tend to have lower median family income in the data?

plot(college$Pell, college$MedIncome, main = "Pell Grant Percentage vs Median Family Income", xlab = "Percent Receiving Pell Grants", ylab = "Median Family Income ($1000s)",col = "darkgreen")

From the data it shows that as the percentage of students receiving Pell Grants increases, the median family income of students tends to decrease.

Q7. Is there a relationship between average SAT score and completion rate in the data?

cor(college$AvgSAT, college$CompRate, use = "complete.obs")
## [1] 0.8189495

For the data is shows a positive correlation so chools with higher SAT averages tend to have higher completion rates.

Q8. Do schools that are online-only have different average undergraduate enrollment than schools that are not in the data?

boxplot(college$Enrollment ~ college$Online, main = "Undergraduate Enrollment: Online vs On-Campus Schools", xlab = "Online-Only School", ylab = "Undergraduate Enrollment", names = c("No", "Yes"), col = c("lightblue", "lightgreen"))

From the data the boxplot comparing undergraduate enrollment for online-only vs on-campus schools shows that online-only institutions generally have smaller median enrollments than traditional on-campus universities.

Q9. Do universities with a higher average total cost also have a higher average student debt in the data?

cor(college$Cost, college$Debt, use = "complete.obs")
## [1] -0.2144525
plot(college$Cost, college$Debt, main = "Average Total Cost vs Average Student Debt", xlab = "Average Total Cost ($)", ylab = "Average Student Debt ($)", col = "darkblue")

The scatterplot of the data of average total cost versus average student debt shows a negative correlation, indicating that universities with higher total costs tend to have lower average student debt.

Q10. Do universities with higher median family income tend to have higher average net prices in the data?

plot(college$MedIncome, college$NetPrice,main = "Median Family Income vs Average Net Price", xlab = "Median Family Income ($1000s)", ylab = "Average Net Price ($)", col = "darkorange")

cor(college$MedIncome, college$NetPrice, use = "complete.obs")
## [1] 0.5151298

From the data it shows a positive correlation meaning schools with wealthier students tend to charge higher net prices which means students get less help financially.

3. Summary

4. Appendix

# Q1 code: mean(college$Enrollment, na.rm = TRUE)

# Q2 code: var(college$TuitionIn, use = "complete.obs")

# Q3 code: hist(college$Cost, main = "Distribution of Total Cost of Attendance", xlab = "Total Cost ($)", col = "lightblue", breaks = 30)

# Q4 code: hist(college$MedIncome, main = "Histogram of Income", xlab = "Income")

# Q5 code: cor(college$White, college$Female, use = "complete.obs")

# Q6 code: plot(college$Pell, college$MedIncome, main = "Pell Grant Percentage vs Median Family Income", xlab = "Percent Receiving Pell Grants", ylab = "Median Family Income ($1000s)",col = "darkgreen")

# Q7 code: cor(college$AvgSAT, college$CompRate, use = "complete.obs")

# Q8 code: boxplot(college$Enrollment ~ college$Online, main = "Undergraduate Enrollment: Online vs On-Campus Schools", xlab = "Online-Only School", ylab = "Undergraduate Enrollment", names = c("No", "Yes"), col = c("lightblue", "lightgreen"))

# Q9 code: cor(college$Cost, college$Debt, use = "complete.obs")
# Q9 code: plot(college$Cost, college$Debt, main = "Average Total Cost vs Average Student Debt", xlab = "Average Total Cost ($)", ylab = "Average Student Debt ($)", col = "darkblue")

# Q10 code: plot(college$MedIncome, college$NetPrice,main = "Median Family Income vs Average Net Price", xlab = "Median Family Income ($1000s)", ylab = "Average Net Price ($)", col = "darkorange")
# Q10 code: cor(college$MedIncome, college$NetPrice, use = "complete.obs")