Research Question:
How do cost-related factors vary across universities, and how do these factors influence educational success and accessibility?
As a demonstration of what I’ve learned, I will utilize Posit to evaluate and describe the following cost parameters: average, median, variance, standard deviation, percentile, and correlation.
Using various graphs and plots, I will illustrate the distribution of cost and analyze how cost relates to region, debt, and completion rate.
Data Collection:
I use the data provided on a sample of 2012 American colleges and universities collected by the Department of Education for the College Scoreboard that I accessed through the Lock5stat website: (https://www.lock5stat.com/index.html).
Data Cleaning:
As instructed by my professor, Dr. Zhang, missing values will be ignored for the sake of simplicity. Outliers and data inconsistencies will not be addressed in this report. However, with the large sample size of 2012 colleges and universities, the skewing effect of outliers and data inconsistencies will be minimized.
My Questions: (Final 10 questions are bolded)
Based off of my understanding of the data, I came up with the following questions:
ChatGBT Questions: (Final 10 questions are bolded)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
## [1] 34277.31
The average cost of attending university within the data is $34,277.31. Also known as the mean, the average describes the center of the data, found by adding up all costs and dividing by the number of universities. Unlike the median, the average takes outliers into account.
## [1] 30699
The median cost of attending university within the data is $30,699. The median represents the 50th percentile of the data set, which means half the universities cost more and half cost less than $30,699. Unlike the mean, the median is less sensitive to outliers.
## [1] 233433900
The variance of cost to attend university is 233,433,900. Variance is meant to describe the relative spread of the data, but this number, in isolation, is difficult to interpret. Therefore, I will use standard deviation to make better sense of the data.
## [1] 15278.54
The standard deviation of cost to attend university is $15,278.54. Using the mean I calculated above, this means that 68% of universities cost between $18,998.77 and $49,555.85, assuming normal distribution. To find out whether this data set follows a normal distribution, I will create a density plot line and analyze the shape.
## 95%
## 64595.05
The 95th percentile of university cost is $64,595.05, meaning the top 5% of most expensive universities cost $64,595.05 and above. This number is over double the median university cost, $30,699.
## [1] 0.5870019
The correlation between cost and completion rate is 0.5870019. Because the correlation is >0, cost and completion rate have a moderate, but not strong, positive correlation. As costs increase, completion rate tends to increase as well but there are other factors at play.
The histogram above illustrates the distribution of university cost across $5000 increment containers. The frequency of high outliers above $70,000 and low outliers below $10,000 can are shown. The mode of this histogram is represented by the $20,000 to $25,000 bin, where there are roughly 340 universities.
As opposed to the histogram shown above, the density plot of cost uses a smooth line to illustrate a continuous progression of cost. To answer my question from section Q4, this data set does not represent a bell shape curve, as it is rises steeply from $10,000 to $20,000 before leveling out. This graph is positively skewed and shows that the most expensive universities cost over $80,000.
As the side-by-side box-and-whisker plots shown above, cost is related to regional location in the US. The thick black line represents the median, while the lower and upper boundaries of the blue boxes represent the 1st and 3rd quartile boundaries, respectively. The blue box, called the interquartile range, represents 50% of the data set. The “whiskers” represent the max and min ranges of the data. Circles represent outliers, unaccounted for in the box and whisker.
Attending university in the Northeast is most expensive with a median of $40,000 and 3rd quartile of $55,000, followed by the Midwest, West, Southeast, and Territory regions.
In the scatterplot matrix shown above, cost, debt, and completion rate are compared in pairs simultaneously. Cost and debt have approximately zero correlation as when cost increases, debt remains about constant, although debt is highest around $20,000 to $40,000.
Cost and completion rate have a moderately positive correlation rate, which supports my assessment in Q6, as when cost goes up, so does completion rate. However many universities at lower costs have similarly high completion rates, which lowers the intensity of correlation.
Debt and completion rate have essentially no correlation as debt remains constant across completion rates.
To reiterate, my research question for this project was, How do cost-related factors vary across universities, and how do these factors influence educational success and accessibility?
In today’s day and age, attending college and university costs about $30,000 to $34,000 for the average person. However, there is large variance with the majority of students paying from $20,000 to $50,000 per year. The cost of attending university is positively skewed, as a substantial number of students pay from $50,000 to $70,000 per year.
Paying more for university will increase the odds of completing college; however, a significant number of students still complete college despite lower tuition costs. Attending university in the Northeast will cost the most, while higher education in territory regions is the most financially accessible. Tuition cost does not have a strong correlation with debt after graduation.
In summary, if I were to advise a random high school student on where to attend college, I would advise that they should consider an expensive university only if they can comfortably afford it. Otherwise, a medium-cost university should do just fine.