1. Introduction

Research Question:

How do cost-related factors vary across universities, and how do these factors influence educational success and accessibility?

As a demonstration of what I’ve learned, I will utilize Posit to evaluate and describe the following cost parameters: average, median, variance, standard deviation, percentile, and correlation.

Using various graphs and plots, I will illustrate the distribution of cost and analyze how cost relates to region, debt, and completion rate.

2. Methodology

Data Collection:

I use the data provided on a sample of 2012 American colleges and universities collected by the Department of Education for the College Scoreboard that I accessed through the Lock5stat website: (https://www.lock5stat.com/index.html).

Data Cleaning:

As instructed by my professor, Dr. Zhang, missing values will be ignored for the sake of simplicity. Outliers and data inconsistencies will not be addressed in this report. However, with the large sample size of 2012 colleges and universities, the skewing effect of outliers and data inconsistencies will be minimized.

My Questions: (Final 10 questions are bolded)

Based off of my understanding of the data, I came up with the following questions:

  1. What is the average cost of attending university?
  2. What is the median cost of attending university?
  3. What is the distribution of cost using a histogram?
  4. What is the median of average net price?
  5. Do more expensive universities have higher average SAT scores?
  6. What is the 95th percentile of university cost?
  7. What is the standard deviation of cost to attend university?
  8. What is the relationship between cost and region using side-by-side boxplots?
  9. What does bar plot of university control (Private, Public, Profit) look like?
  10. What does the scatter plot matrix of cost vs. debt vs. completion rate look like?

ChatGBT Questions: (Final 10 questions are bolded)

  1. What is the correlation between cost and completion rate?
  2. What is the correlation between cost and instructional spending per student?
  3. What is the density plot of cost?
  4. What is the variance of cost to attend university?
  5. How do different locales (city, rural, suburb) affect the tuition cost of universities?
  6. What is the relationship between faculty composition (full-time faculty) and completion rate?
  7. How does the percentage of minority students (Black, Hispanic, Asian, etc.) relate to university costs?
  8. Is there a significant relationship between enrollment size and tuition costs?
  9. How does the average tuition vary across different regions of the U.S.?
  10. Is there a difference in tuition between private, public, and for-profit universities?

3. Results and Analysis

##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the average cost of attending university?

## [1] 34277.31

The average cost of attending university within the data is $34,277.31. Also known as the mean, the average describes the center of the data, found by adding up all costs and dividing by the number of universities. Unlike the median, the average takes outliers into account.

Q2: What is the median cost of attending university?

## [1] 30699

The median cost of attending university within the data is $30,699. The median represents the 50th percentile of the data set, which means half the universities cost more and half cost less than $30,699. Unlike the mean, the median is less sensitive to outliers.

Q3: What is the variance of cost to attend university?

## [1] 233433900

The variance of cost to attend university is 233,433,900. Variance is meant to describe the relative spread of the data, but this number, in isolation, is difficult to interpret. Therefore, I will use standard deviation to make better sense of the data.

Q4: What is the standard deviation of cost to attend university?

## [1] 15278.54

The standard deviation of cost to attend university is $15,278.54. Using the mean I calculated above, this means that 68% of universities cost between $18,998.77 and $49,555.85, assuming normal distribution. To find out whether this data set follows a normal distribution, I will create a density plot line and analyze the shape.

Q5: What is the 95th percentile of university cost?

##      95% 
## 64595.05

The 95th percentile of university cost is $64,595.05, meaning the top 5% of most expensive universities cost $64,595.05 and above. This number is over double the median university cost, $30,699.

Q6: What is the correlation between cost and completion rate?

## [1] 0.5870019

The correlation between cost and completion rate is 0.5870019. Because the correlation is >0, cost and completion rate have a moderate, but not strong, positive correlation. As costs increase, completion rate tends to increase as well but there are other factors at play.

Q7: What is the distribution of cost using a histogram?

The histogram above illustrates the distribution of university cost across $5000 increment containers. The frequency of high outliers above $70,000 and low outliers below $10,000 can are shown. The mode of this histogram is represented by the $20,000 to $25,000 bin, where there are roughly 340 universities.

Q8: What is the density plot of cost?

As opposed to the histogram shown above, the density plot of cost uses a smooth line to illustrate a continuous progression of cost. To answer my question from section Q4, this data set does not represent a bell shape curve, as it is rises steeply from $10,000 to $20,000 before leveling out. This graph is positively skewed and shows that the most expensive universities cost over $80,000.

Q9: What is the relationship between cost and region using side-by-side boxplots?

As the side-by-side box-and-whisker plots shown above, cost is related to regional location in the US. The thick black line represents the median, while the lower and upper boundaries of the blue boxes represent the 1st and 3rd quartile boundaries, respectively. The blue box, called the interquartile range, represents 50% of the data set. The “whiskers” represent the max and min ranges of the data. Circles represent outliers, unaccounted for in the box and whisker.

Attending university in the Northeast is most expensive with a median of $40,000 and 3rd quartile of $55,000, followed by the Midwest, West, Southeast, and Territory regions.

Q10: What does the scatter plot matrix of cost vs. debt vs. completion rate look like?

In the scatterplot matrix shown above, cost, debt, and completion rate are compared in pairs simultaneously. Cost and debt have approximately zero correlation as when cost increases, debt remains about constant, although debt is highest around $20,000 to $40,000.

Cost and completion rate have a moderately positive correlation rate, which supports my assessment in Q6, as when cost goes up, so does completion rate. However many universities at lower costs have similarly high completion rates, which lowers the intensity of correlation.

Debt and completion rate have essentially no correlation as debt remains constant across completion rates.

3. Conclusion

To reiterate, my research question for this project was, How do cost-related factors vary across universities, and how do these factors influence educational success and accessibility?

In today’s day and age, attending college and university costs about $30,000 to $34,000 for the average person. However, there is large variance with the majority of students paying from $20,000 to $50,000 per year. The cost of attending university is positively skewed, as a substantial number of students pay from $50,000 to $70,000 per year.

Paying more for university will increase the odds of completing college; however, a significant number of students still complete college despite lower tuition costs. Attending university in the Northeast will cost the most, while higher education in territory regions is the most financially accessible. Tuition cost does not have a strong correlation with debt after graduation.

In summary, if I were to advise a random high school student on where to attend college, I would advise that they should consider an expensive university only if they can comfortably afford it. Otherwise, a medium-cost university should do just fine.

4. Citations