For this project, We are using data from lock5stat.com, more specifically the “CollegeScores4yr” data.
With this data, I propose the following 10 questions based on my own understanding of it:
10 additional questions will be proposed by using ChatGPT:
Of these two sets of questions, 10 questions will be selected (5 from own questions, 5 from chat GPT) and used for analysis. Below is the first 6 rows of the data set being used. To view the full data set, please use the hyperlink above.
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
## [1] 34277.31
The mean cost for all the colleges in the given data set is $34,277.31
## [1] 0.5373884
The correlation between cost and average SAT score is 0.5373884. This means that a positive correlation between cost and average SAT score. Though not a perfect positive correlation, it is still a positive correlation nonetheless. In other words, a higher SAT score can mean a higher cost for school. This tracks as more prestigious schools, which usually cost more than other schools, tend to have students with higher SAT scores.
This histogram demonstrates the overall cost of schools within the data set. We can see that most colleges hover around the $20,000 range, meaning a lot of colleges are on the more affordable side when it comes to financial amounts.
## [1] 37.85296
The mean percentage of college students receiving Pell grants in public and private schools is about 38.85%.
## [1] 21948.55
## [1] 25336.66
This is the average cost of tuition for students that are in-state ($21,948.55) and out-of-state ($25,336.66). This data shows that it costs more for students that go to colleges that are not native to the state in which the college is at.
The box graph represents the average graduation rate of students. Looking at the stats of the graph, we can see that the numbers for the extreme of the lower whisker, lower hinge, the median, the upper hinge, and the extreme of the upper whisker, represented here in the order that is mentioned:
[1,] 0.00 [2,] 38.18 [3,] 52.45 [4,] 66.67 [5,] 100.00
This scatter plot demonstrates the relationship between cost and admission rate. As the cost goes up, the admission rate trends downward, as represented with the trend line within the graph.
##
## Midwest Northeast Southeast Territory West
## 492 552 475 48 445
The numbers and pie chart above represent the amount of colleges located in each region of the US.
## [1] 59.15
## [1] 13.175
the median of the percentage of female students in college is 59.15%, while the interquartile range is 13.175. This means that there are more slightly more women in college compared to men and that the middle 50% of the data is spread over a range of about 13 units.
## [1] 0.1678195
The correlation between enrollment and graduation rate is about 0.168. This indicates that there is a very weak positive correlation between enrollment and completion rate.
Analyzing the questions picked show quite a few things about US colleges that we can gain insight from. starting with question 1, we can see that the average total cost is about 34 thousand dollars. This number can be seen as high or low depending on the person, but if loans are taken out to pay for this average, a 10 year repayment plan, excluding interest, would be $285.23. This seems fairly reasonable as the average salary in the US according to the United States Government website is $66,621.80. However, other bills including groceries, insurance, mortgages, and other things could strain the wallet of the average US citizen. Looking at the distribution of cost, more specifically the histogram above, paints a more favorable opinion for those with financial stuggles, as most of the colleges tend to be within the $15-$25 thousand range, meaning that you might be able to go to a college that is cheaper than the national average included in this data.
The correlation between cost and the average SAT score was at about 0.537. This means that there is a positive correlation between cost and higher SAT scores. When looking at more well known or “prestigious” colleges, it is usually harder to get into those schools so students need to have higher test scores to be competitive to go to that college. These colleges also tend to have higher costs in terms of tuition, room and board, etc. About 37% of students are receiving Pell grants, which means that more students are taking private loans, paying for it them self, have a scholarship, have not opted to file for a Pell grant, etc.
In-state vs. out-of-state tuition cost is not surprising, as it is generally more expensive for out-of-state students. Another unremarkable insight is that as the cost of a college goes up, the rate of admission goes down as people do not want to go to an expensive college, as well as more prestigious schools, which cost more to attend, generally have stricter requirements to attend. What is surprising is the average graduation rate. using the box plot, the median sits at a 52.45%, with the lower hinge being at 38.18% and the upper hinge being at 66.67%. This means that when using the median, only half of students are graduating if they are attending college.
Looking at the pie chart, we can see how many colleges are within each region of the US that are in this data set. The Northeast has the most at 552, followed by the Midwest (492), the Southeast (475), the West (445), and then the Territories (48). Moving on to the correlation between undergraduate enrollment and graduation rate, there is a very weak positive correlation (about 0.1678) between enrollment and completion rate. this can mean that as enrollment goes up, completion rate also goes up, but barely. Lastly, looking at the percentage of female students in college, the average is 59.19%. This means that there is slightly more women attending college than men.
These questions have given insight into the different aspects of the data set. From correlations, to demographics, to averages, there are interesting things that can be taken away. Analyzing the rest of the questions could provide further helpful information that isn’t readily given, as well as proposing new questions. Uncovering the statistics of data is very important and should be explored. As Mark Twain once said: “There are lies, damned lies, and statistics.”
knitr::opts_chunk$set(echo = TRUE)
college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
mean(college$Cost, na.rm = TRUE)
cor(college$Cost, college$AvgSAT, use = "complete.obs")
hist(college$Cost, main = "Histogram of Cost", xlab = "Cost", col = "seagreen",)
mean(college$Pell, na.rm = TRUE)
mean(college$TuitionIn, na.rm = TRUE)
mean(college$TuitonOut, na.rm = TRUE)
boxplot(college$CompRate,
main = "Average Graduation Rate of College Institutions",
xlab = "Students",
ylab = "Graduation Rate of students",
col = "seagreen")
scatter.smooth(college$Cost, college$AdmitRate,
xlab = "Cost",
ylab = "Admission Rate")
table(college$Region)
x <- c(492, 552, 475, 48, 445)
labels = c("Midwest","Northeast","Southeast","Territory","West")
colors = c("deepskyblue3","dodgerblue","cyan3","mediumspringgreen","palegreen2")
pie(x,labels, main = "Colleges Located in Each Region of the US", col = colors)
median(college$Female, na.rm = TRUE)
IQR(college$Female, na.rm = TRUE)
cor(college$CompRate, college$Enrollment, use = "complete.obs")
scatter.smooth(college$CompRate, college$Enrollment,
xlab = "Completion Rate",
ylab = "Enrollment")
Social Security Administration average wage index (2023) https://www.ssa.gov/oact/cola/AWI.html