1. Introduction

For this project, We are using data from lock5stat.com, more specifically the “CollegeScores4yr” data.

With this data, I propose the following 10 questions based on my own understanding of it:

10 additional questions will be proposed by using ChatGPT:

2. Analysis

Of these two sets of questions, 10 questions will be selected (5 from own questions, 5 from chat GPT) and used for analysis. Below is the first 6 rows of the data set being used. To view the full data set, please use the hyperlink above.

##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the mean of cost for all the colleges in the data?

## [1] 34277.31

The mean cost for all the colleges in the given data set is $34,277.31

Q2: What is the correlation between cost and average SAT score?

## [1] 0.5373884

The correlation between cost and average SAT score is 0.5373884. This means that a positive correlation between cost and average SAT score. Though not a perfect positive correlation, it is still a positive correlation nonetheless. In other words, a higher SAT score can mean a higher cost for school. This tracks as more prestigious schools, which usually cost more than other schools, tend to have students with higher SAT scores.

Q3: What is the distribution of cost?

This histogram demonstrates the overall cost of schools within the data set. We can see that most colleges hover around the $20,000 range, meaning a lot of colleges are on the more affordable side when it comes to financial amounts.

Q4. What is the mean percentage of students receiving Pell grants in public and private schools?

## [1] 37.85296

The mean percentage of college students receiving Pell grants in public and private schools is about 38.85%.

Q5. What is the mean in-state and out-of-state tuition cost?

## [1] 21948.55
## [1] 25336.66

This is the average cost of tuition for students that are in-state ($21,948.55) and out-of-state ($25,336.66). This data shows that it costs more for students that go to colleges that are not native to the state in which the college is at.

Q6. What is the average graduation rate of College Institutions?

The box graph represents the average graduation rate of students. Looking at the stats of the graph, we can see that the numbers for the extreme of the lower whisker, lower hinge, the median, the upper hinge, and the extreme of the upper whisker, represented here in the order that is mentioned:

[1,] 0.00 [2,] 38.18 [3,] 52.45 [4,] 66.67 [5,] 100.00

Q7. What is the relationship between cost and admission rates?

This scatter plot demonstrates the relationship between cost and admission rate. As the cost goes up, the admission rate trends downward, as represented with the trend line within the graph.

Q8. What is the proportion of colleges located in each region of the US?

## 
##   Midwest Northeast Southeast Territory      West 
##       492       552       475        48       445

The numbers and pie chart above represent the amount of colleges located in each region of the US.

Q9. What is the median and interquartile range (IQR) of the percentage of female students?

## [1] 59.15
## [1] 13.175

the median of the percentage of female students in college is 59.15%, while the interquartile range is 13.175. This means that there are more slightly more women in college compared to men and that the middle 50% of the data is spread over a range of about 13 units.

Q10. Does a higher undergraduate enrollment correspond to a higher or lower graduation rate?

## [1] 0.1678195

The correlation between enrollment and graduation rate is about 0.168. This indicates that there is a very weak positive correlation between enrollment and completion rate.

3. Summary

Analyzing the questions picked show quite a few things about US colleges that we can gain insight from. starting with question 1, we can see that the average total cost is about 34 thousand dollars. This number can be seen as high or low depending on the person, but if loans are taken out to pay for this average, a 10 year repayment plan, excluding interest, would be $285.23. This seems fairly reasonable as the average salary in the US according to the United States Government website is $66,621.80. However, other bills including groceries, insurance, mortgages, and other things could strain the wallet of the average US citizen. Looking at the distribution of cost, more specifically the histogram above, paints a more favorable opinion for those with financial stuggles, as most of the colleges tend to be within the $15-$25 thousand range, meaning that you might be able to go to a college that is cheaper than the national average included in this data.

The correlation between cost and the average SAT score was at about 0.537. This means that there is a positive correlation between cost and higher SAT scores. When looking at more well known or “prestigious” colleges, it is usually harder to get into those schools so students need to have higher test scores to be competitive to go to that college. These colleges also tend to have higher costs in terms of tuition, room and board, etc. About 37% of students are receiving Pell grants, which means that more students are taking private loans, paying for it them self, have a scholarship, have not opted to file for a Pell grant, etc.

In-state vs. out-of-state tuition cost is not surprising, as it is generally more expensive for out-of-state students. Another unremarkable insight is that as the cost of a college goes up, the rate of admission goes down as people do not want to go to an expensive college, as well as more prestigious schools, which cost more to attend, generally have stricter requirements to attend. What is surprising is the average graduation rate. using the box plot, the median sits at a 52.45%, with the lower hinge being at 38.18% and the upper hinge being at 66.67%. This means that when using the median, only half of students are graduating if they are attending college.

Looking at the pie chart, we can see how many colleges are within each region of the US that are in this data set. The Northeast has the most at 552, followed by the Midwest (492), the Southeast (475), the West (445), and then the Territories (48). Moving on to the correlation between undergraduate enrollment and graduation rate, there is a very weak positive correlation (about 0.1678) between enrollment and completion rate. this can mean that as enrollment goes up, completion rate also goes up, but barely. Lastly, looking at the percentage of female students in college, the average is 59.19%. This means that there is slightly more women attending college than men.

These questions have given insight into the different aspects of the data set. From correlations, to demographics, to averages, there are interesting things that can be taken away. Analyzing the rest of the questions could provide further helpful information that isn’t readily given, as well as proposing new questions. Uncovering the statistics of data is very important and should be explored. As Mark Twain once said: “There are lies, damned lies, and statistics.”

4. Appendix: All code used for this report

knitr::opts_chunk$set(echo = TRUE)

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)

mean(college$Cost, na.rm = TRUE)

cor(college$Cost, college$AvgSAT, use = "complete.obs")

hist(college$Cost, main = "Histogram of Cost", xlab = "Cost", col = "seagreen",)

mean(college$Pell, na.rm = TRUE)

mean(college$TuitionIn, na.rm = TRUE)
mean(college$TuitonOut, na.rm = TRUE)

boxplot(college$CompRate, 
        main = "Average Graduation Rate of College Institutions",
        xlab = "Students",
        ylab = "Graduation Rate of students",
        col =  "seagreen")

scatter.smooth(college$Cost, college$AdmitRate,
               xlab = "Cost",
               ylab = "Admission Rate")

table(college$Region)
x <- c(492, 552, 475, 48, 445)
labels = c("Midwest","Northeast","Southeast","Territory","West")
colors = c("deepskyblue3","dodgerblue","cyan3","mediumspringgreen","palegreen2")
pie(x,labels, main = "Colleges Located in Each Region of the US", col = colors)

median(college$Female, na.rm = TRUE)
IQR(college$Female, na.rm = TRUE)

cor(college$CompRate, college$Enrollment, use = "complete.obs")
scatter.smooth(college$CompRate, college$Enrollment,
               xlab = "Completion Rate",
               ylab = "Enrollment")

Social Security Administration average wage index (2023) https://www.ssa.gov/oact/cola/AWI.html