1. Introduction

We use the data from the webpage Lock5Stat COllege Data to explore various aspects of US colleges. This analysis addresses ten key questions, focusing on metrics such as enrollment, SAT scores, admission rates, faculty salaries, and more. The questions we explore include:

I propose the following 10 questions based on my own understanding of the data.

Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean enrollment across all colleges?

## [1] 4484.831

The mean enrollment across all colleges is 4484.831.

Q2: What is the median of the average SAT scores for all colleges?

## [1] 1121

The median of average SAT scores for all colleges is 1121.

Q3: What is the variance of the admission rates?

## [1] 0.04333848

The variance of admission rates is 0.04333848.

Q4: What is the standard deviation of the percentage of first-generation students?

## [1] 11.08522

The standard deviation of the percentage of first generation students is 11.08522.

Q5: What is the correlation between net price and average debt?

## [1] -0.1091143

The correlation between net price and average debt is -0.1091143.

Q6: How does the distribution of in-state tuition fees across all colleges?

The distribution of in-state tuition fees is right-skewed, with a higher frequency of colleges having lower tuition costs.

Q7: How does the average monthly salary for full-time faculty vary across different control types(Private, Public, Profit)?

Faculty salaries vary by control type, with private colleges typically paying the highest salaries.

Q8: What is the percentage distribution of schools across different US regions?

The percentage distribution of schools across different US regions vary with Northeast having the highest of 27.4% and Territory having the lowest at 2.4%.

Q9: How do admission rates compare across different regions?

Admission rates vary across different regions with Midwest being the highest and Southeast having the lowest.

Q10: What is the range of instructional spending per FTE student?

## [1]  0.0 97.6
## [1] 97.6

The range of instructional spending per FTE student is 97.6 between spendings 0 and 97.6.

Summary

This project analyzed various aspects of college data using R. We explored key questions related to enrollment, SAT scores, tuition fees, faculty salaries, financial aid, and admission rates across different regions and institution types.

Key findings include:

Appendix: R Code

# Load data
# college <- read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")

# Q1: Mean enrollment
# mean(college$Enrollment, na.rm = TRUE)

# Q2: Median SAT scores
# college$AvgSAT <- as.numeric(as.character(college$AvgSAT))
# median(college$AvgSAT, na.rm = TRUE)

# Q3: Variance of admission rates
# var(college$AdmitRate, na.rm = TRUE)

# Q4: Standard deviation of first-generation students
# sd(college$FirstGen, na.rm = TRUE)

# Q5: Correlation between net price and average debt
# cor(college$NetPrice, college$Debt, use = "complete.obs")

# Q6: Distribution of in-state tuition fees
# hist(college$TuitionIn, main = "Distribution of In-State Tuition Fees", 
#     xlab = "In-State Tuition Fees", ylab = "Frequency", 
#     col = "lightblue", border = "black", breaks = 20)

# Q7: Faculty salaries across control types
# boxplot(college$FacSalary ~ college$Control)

# Q8: Percentage distribution of schools across regions
# region_counts <- table(college$Region)
# region_percentages <- prop.table(region_counts) * 100
# pie(region_percentages)

# Q9: Admission rates across regions
# region_admit_rate <- tapply(college$AdmitRate, college$Region, mean, na.rm = TRUE)
# barplot(region_admit_rate)

# Q10: Range of Pell grant percentage
# range_pell <- range(college$Pell, na.rm = TRUE)
# range_pell[2] - range_pell[1]