1. Introduction

In this project, we explore the CollegeScores4yr data set to better understand different characteristics of colleges in the United States. We focus on variables such as tuition, enrollment, faculty salaries, part-time students, and more. Using techniques from Chapter 6, we answer ten simple questions by applying basic statistical methods like mean, median, standard deviation, correlation, and graphical tools such as histograms, boxplots.

All analyses are performed using RStudio, allowing us to visualize the data and summarize important patterns clearly and effectively.

I propose the following 10 questions based on my own understanding of the data.

2. Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean of in-state tuition and fees in the data?

mean(college$TuitionIn, na.rm = TRUE)
## [1] 21948.55

The mean of in-state tuition and fee is 21948.55.

Q2: What is the median of undergraduate enrollment?

median(college$Enrollment, na.rm = TRUE)
## [1] 1722

The median undergraduate enrollment is 1722.

Q3: What is the distribution of average monthly salary for full-time faculty?

hist(college$FacSalary, main = "Distribution of Faculty Salaries", xlab = "Average Monthly Salary", col = "yellow")

The distribution of average monthly salary for full-time faculty is above.

Q4: What is the correlation between average SAT score and completion rate?

cor(college$AvgSAT, college$CompRate, use = "complete.obs")
## [1] 0.8189495

The correlation between average SAT score and completion rate is 0.8189495.

Q5: What does the boxplot show about first-generation college students?

boxplot(college$FirstGen, main = "Boxplot of First-Generation Students", ylab = "Percent of First-Gen Students", col = "orange")

Distribution of first-generation students across colleges, as shown by the boxplot.

Q6: What is the standard deviation of faculty salary?

sd(college$FacSalary, na.rm = TRUE)
## [1] 2563.004

The standard deviation of faculty salary is 2563.004.

Q7: What is the distribution of net price?

hist(college$NetPrice, main = "Distribution of Net Price", xlab = "Net Price", col = "blue")

The distribution of net price is above.

Q8: What is the correlation between family income and average net price?

cor(college$MedIncome, college$NetPrice, use = "complete.obs")
## [1] 0.5151298

The correlation between family income and average net price is 0.5151298.

Q9: What is the mean percentage of female students?

mean(college$Female, na.rm = TRUE)
## [1] 59.29588

The mean percentage of female students is 59.29588.

Q10: What is the variance in instructional spending per FTE student?

var(college$InstructFTE, na.rm = TRUE)
## [1] 123100321

The variance in instructional spending per FTE student is 123100321.

3. Summary

Through this project, we applied statistical methods from Chapter 6 to examine various aspects of U.S. colleges using the CollegeScores4yr data set. We used tools such as mean, median, standard deviation, correlation, and histograms to answer ten questions, each focused on a single variable.

Our analysis revealed several useful insights. For example, we found the average in-state tuition across colleges, explored how faculty salaries are distributed, and identified the median enrollment size. We also discovered relationships between variables, such as a positive correlation between average SAT scores and completion rates, and between family income and net price. By analyzing percentages of part-time and female students, as well as instructional spending per student, we gained a broader understanding of both costs and demographics in higher education.

Working with this data set allowed us to apply the concepts from Chapter 6 in a practical context. We were able to explore real trends in college-related variables and interpret the results using basic statistical methods. This experience not only helped us improve our R skills but also deepened our understanding of how data can be used to support analysis and decision-making in education.

4. Appendix (R Code)

# Q1 mean(college$TuitionIn, na.rm = TRUE)
# Q2 median(college$Enrollment, na.rm = TRUE)
# Q3 hist(college$FacSalary, main = "Distribution of Faculty Salaries", xlab = "Average Monthly Salary", col = "yellow")
# Q4 cor(college$AvgSAT, college$CompRate, use = "complete.obs")
# Q5 boxplot(college$FirstGen, main = "Boxplot of First-Generation Students", ylab = "Percent of First-Gen Students", col = "orange")
# Q6 sd(college$FacSalary, na.rm = TRUE)
# Q7 hist(college$NetPrice, main = "Distribution of Net Price", xlab = "Net Price", col = "blue")
# Q8 cor(college$MedIncome, college$NetPrice, use = "complete.obs")
# Q9 mean(college$Female, na.rm = TRUE)
# Q10 var(college$InstructFTE, na.rm = TRUE)