1. Introduction

We use the data from “CollegeScores4yr”

I propose the following 10 questions based on my own understanding of the data. - 1. What is the mean of Average net price? - 2. How many colleges are in each locale? - 3. What is the distribution of Average combined SAT scores? - 4. What is the mean admission rate among schools? - 5. What is the range of faculty salaries across schools? - 6. What is the average cost of in-state tuition and fees? - 7. How many colleges are in each region? - 8. What is the range of completion rate among schools? - 9. How does the net price vary across regions? - 10. How much does the average debt vary among students?

Analysis

We explore each question in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean of Average net price for all institutions?

mean(college$NetPrice, na.rm = TRUE)
## [1] 19886.82

The mean average net price is $19,886.82.

###Q2: How many colleges are in each locale?

barplot(table(college$Locale), col = "yellow", main = "Number Colleges by Locale", ylab = "Count")

From the diagram, it is more popular for colleges to be in the City than in other locales. There are less than 200 colleges that are in Rural locales.

Q3: What is the distribution of Average combined SAT scores?

hist(college$AvgSAT, main = "Histogram of Average combined SAT Scores", xlab = "SAT Scores", col = "blue")

The more frequent for SAT scores to be between 1000 and 1200 in college. Fewer get above 1400.

Q4: What is the mean admission rate among schools?

mean(college$AdmitRate, na.rm = TRUE)
## [1] 0.6702025

The average admission rate among all schools is 67%.

Q5: What is the range of faculty salaries across schools?

hist(college$FacSalary, main = "Histogram of Faculty Salaries", xlab = "Faculty Salary", col = "pink")

From the histogram, we see the largest number of full-time faculty get payed between $5,000 and $10,0000 for their average monthly salary.

Q6: What is the average cost of in-state tuition and fees?

mean(college$TuitionIn, na.rm = TRUE)
## [1] 21948.55

The average in-state tuition is $21,948.55.

Q7: How many colleges are in each region?

barplot(table(college$Region), col = "yellow", main = "Number Colleges by Region", ylab = "Count")

The Northeast region has the most colleges followed closely by the Midwest, Southeat, and West Regions. The U.S Territory Regions have very few colleges.

Q8: What is the range of completion rate among schools?

hist(college$CompRate, main = "Range of Completion Rate", xlab = "CompRate", col = 'orange')

From the histogram, most schools have a 40%-60% completion rate.

Q9: How does the net price vary across regions?

boxplot(NetPrice ~ Region, data = college, col = "blue", main = "Net Price by Region", xlab = "Region", ylab = "Net Price")

The Northeast and West regions have higher average net price compared to other regions with territory regions being the lowest, less than $10,000.

Q10: How much does the average debt vary among students?

sd(college$Debt, na.rm = TRUE)
## [1] 5360.986

The standard deviation in student debt is approximately $5,361 which indicates there is substantial variation in how much students borrow across different colleges.

Summary

In the project, I explored the “CollegeScores4yr” data. Using descriptive statistical methods outlined in chapter 6, I was able to analyze ten statistical questions using an R markdown document. The dataset includes variables such as average SAT scores, tuition costs, graduation rates, and other demographics. The goal of this project is to analyze the questions using techniques such as mean, variance, standard deviation, correlation, histograms, and boxplots.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.