Introduction

We use the data from CollegScores4yr to complete our data analysis.

I propose the following 10 questions based on my own understanding of data.

Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the average in-state tuition for the colleges?

mean_tuition <- mean(college$TuitionIn, na.rm = TRUE)
cat("1. Mean in-state tuition:", mean_tuition, "\n")
## 1. Mean in-state tuition: 21948.55

The mean in-state tuition is $21,948.55.

Q2: What is the median in-state tuition for the colleges?

median_tuition <- median(college$TuitionIn, na.rm = TRUE)
cat("2. Median in-state tuition:", median_tuition, "\n")
## 2. Median in-state tuition: 17662

The median in-state tuition is $17,662.

Q3: What is the standard deviation of average student debt?

sd_debt <- sd(college$Debt, na.rm = TRUE)
cat("3. Standard deviation of student debt:", sd_debt, "\n")
## 3. Standard deviation of student debt: 5360.986

The standard deviation of student debt os $5,360.986

Q4: What is the distribution of completion rate?

hist(college$CompRate,
     main = "Histogram of Completion Rate",
     xlab = "Completion Rate",
     col = "lightblue",
     border = "black")

A large majority of schools have completion rates between 40 and 60%

Q5: What are the average SAT scores?

boxplot(college$AvgSAT,
        main = "Boxplot of Average SAT Scores",
        ylab = "Average SAT Score",
        col = "lightgreen")

The average SAT scores are between 1100 and 1200.

Q6: How many colleges are in each region?

barplot(table(college$Region),
        main = "Number of Colleges by Region",
        ylab = "Number of Colleges",
        col = "lightcoral")

The Northeast has the most amount of colleges in a given region.

Q7: What is the ratio of public vs private schools?

control_counts <- table(college$Control)
print(control_counts)
## 
## Private  Profit  Public 
##    1243     170     599
pie(control_counts,
    main = "Ratio of Public vs Private Schools",
    col = c("lightblue", "lightpink"),
    labels = paste(names(control_counts), "\n", control_counts))

There are over 2 times more private schools than public schools.

Q8: What is the correlation between average SAT score and completion rate?

cor_sat_comp <- cor(college$AvgSAT, college$CompRate, use = "complete.obs")
cat("8. Correlation between AvgSAT and Completion Rate:", cor_sat_comp, "\n")
## 8. Correlation between AvgSAT and Completion Rate: 0.8189495

There is a decently strong correlation between SAT scores and completion rate with 0.82

Q9: What is the average admissions rate for all colleges?

mean_admit <- mean(college$AdmitRate, na.rm = TRUE)
cat("9. Average admission rate:", mean_admit, "\n")
## 9. Average admission rate: 0.6702025

The average admissions rate for colleges is 67%

Q10: What is the median family income of students across all colleges?

median_income <- median(college$MedIncome, na.rm = TRUE)
cat("10. Median family income:", median_income, "\n")
## 10. Median family income: 42.6

The median family income is $42,600

Summary

After our data analysis, we have figured out answers to all of our questions surrounding the data.

Appendix

# Q1 code
# Q2 code
# Q3 code
# Q4 code
# Q5 code
# Q6 code
# Q7 code
# Q8 code
# Q9 code
# Q10 code