1. Introduction

Background and Synopsis:

As a university student, I have been assigned by my professor to write a statistical report on college data to demonstrate my understanding of Chapter 6: Descriptive Statistics. Using the RStudio IDE, this report will address 10 statistical questions, collaboratively formulated by myself and ChatGPT, utilizing RStudio’s calculation tools and visual aids. Finally, I will analyze each statistical method individually before summarizing my conclusions.

My report will center around cost, specifically as it relates to region, completion rate, and debt.

Purpose:

Definitions of Variables used in Analysis:

(rest of variable definitions can be found at https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf)

Region: Region of country

Enrollment: Undergraduate enrollment

NetPrice: Average net price (cost minus aid)

Cost: Average total cost for tuition, room, board, etc.

CompRate: Completion rate (percent who finish program within 150% of normal time)

Debt: Average debt for students who complete program

2. Methodology

Data Collection:

I use the data provided on a sample of 2012 American colleges and universities collected by the Department of Education for the College Scoreboard that I accessed through the Lock5stat website: (https://www.lock5stat.com/index.html).

Data Cleaning:

As instructed by my professor, Dr. Zhang, missing values will be ignored for the sake of simplicity. Outliers and data inconsistencies will not be addressed in this report. However, with the large sample size of 2012 colleges and universities, the skewing effect of outliers and data inconsistencies will be minimized.

Descriptive Stat Methods:

My Questions: (Final 10 questions are bolded)

Based off of my understanding of the data, I came up with the following questions:

  1. What is the average cost of attending university?
  2. What is the median cost of attending university?
  3. What is the distribution of cost using a histogram?
  4. What is the median of average net price?
  5. Do more expensive universities have higher average SAT scores?
  6. What is the 95th percentile of university cost?
  7. What is the standard deviation of cost to attend university?
  8. What is the relationship between cost and region using side-by-side boxplots?
  9. What does bar plot of university control (Private, Public, Profit) look like?
  10. What does the scatter plot matrix of cost vs. debt vs. completion rate look like?

ChatGBT Questions: (Final 10 questions are bolded)

  1. What is the correlation between cost and completion rate?
  2. What is the correlation between cost and instructional spending per student?
  3. What is the density plot of cost?
  4. What is the variance of cost to attend university?
  5. How do different locales (city, rural, suburb) affect the tuition cost of universities?
  6. What is the relationship between faculty composition (full-time faculty) and completion rate?
  7. How does the percentage of minority students (Black, Hispanic, Asian, etc.) relate to university costs?
  8. Is there a significant relationship between enrollment size and tuition costs?
  9. How does the average tuition vary across different regions of the U.S.?
  10. Is there a difference in tuition between private, public, and for-profit universities?

3. Results and Analysis

##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the average cost of attending university?

## [1] 34277.31

The average cost of attending university within the data is $34,277.31. Also known as the mean, the average describes the center of the data, found by adding up all costs and dividing by the number of universities. Unlike the median, the average takes outliers into account.

Q2: What is the median cost of attending university?

## [1] 30699

The median cost of attending university within the data is $30,699. The median represents the 50th percentile of the data set, which means half the universities cost more and half cost less than $30,699. Unlike the mean, the median is less sensitive to outliers.

Q3: What is the variance of cost to attend university?

## [1] 233433900

The variance of cost to attend university is 233,433,900. Variance is meant to describe the relative spread of the data, but this number, in isolation, is difficult to interpret. Therefore, I will use standard deviation to make better sense of the data.

Q4: What is the standard deviation of cost to attend university?

## [1] 15278.54

The standard deviation of cost to attend university is $15,278.54. Using the mean I calculated above, this means that 68% of universities cost between $18,998.77 and $49,555.85, assuming normal distribution. To find out whether this data set follows a normal distribution, I will create a density plot line and analyze the shape.

Q5: What is the 95th percentile of university cost?

##      95% 
## 64595.05

The 95th percentile of university cost is $64,595.05, meaning the top 5% of most expensive universities cost $64,595.05 and above. This number is over double the median university cost, $30,699.

Q6: What is the correlation between cost and completion rate?

## [1] 0.5870019

The correlation between cost and completion rate is 0.5870019. Because the correlation is >0, cost and completion rate have a moderate, but not strong, positive correlation. As costs increase, completion rate tends to increase as well but there are other factors at play.

Q7: What is the distribution of cost using a histogram?

The histogram above illustrates the distribution of university cost across $5000 increment containers. The frequency of high outliers above $70,000 and low outliers below $10,000 can are shown. The mode of this histogram is represented by the $20,000 to $25,000 bin, where there are roughly 340 universities.

Q8: What is the density plot of cost?

As opposed to the histogram shown above, the density plot of cost uses a smooth line to illustrate a continuous progression of cost. To answer my question from section Q4, this data set does not represent a bell shape curve, as it is rises steeply from $10,000 to $20,000 before leveling out. This graph is positively skewed and shows that the most expensive universities cost over $80,000.

Q9: What is the relationship between cost and region using side-by-side boxplots?

As the side-by-side box-and-whisker plots shown above, cost is related to regional location in the US. The thick black line represents the median, while the lower and upper boundaries of the blue boxes represent the 1st and 3rd quartile boundaries, respectively. The blue box, called the interquartile range, represents 50% of the data set. The “whiskers” represent the max and min ranges of the data. Circles represent outliers, unaccounted for in the box and whisker.

Attending university in the Northeast is most expensive with a median of $40,000 and 3rd quartile of $55,000, followed by the Midwest, West, Southeast, and Territory regions.

Q10: What does the scatter plot matrix of cost vs. debt vs. completion rate look like?

In the scatterplot matrix shown above, cost, debt, and completion rate are compared in pairs simultaneously. Cost and debt have approximately zero correlation as when cost increases, debt remains about constant, although debt is highest around $20,000 to $40,000.

Cost and completion rate have a moderately positive correlation rate, which supports my assessment in Q6, as when cost goes up, so does completion rate. However many universities at lower costs have similarly high completion rates, which lowers the intensity of correlation.

Debt and completion rate have essentially no correlation as debt remains constant across completion rates.

3. Conclusion

In today’s day and age, the average college and university costs $30,000 to $34,000 to attend on average. However, there is large variance with the majority of universities costing from $20,000 to $50,000 per year. The cost of attending university is positively skewed, as a substantial number of universities cost $50,000 to $70,000 per year.

Paying more for university will increase the odds of completing college; however, a significant number of students still complete college despite lower tuition costs. Attending university in the Northeast will cost the most followed closely by universities in the Midwest, Southeast, and West with higher education in territory regions the most financially accessible. Tuition cost does not have a strong correlation with debt after graduation.

In summary, if a graduating high schooler was deciding between several colleges, I would advise that if they wanted to complete their program above all else, a more expensive university would help their odds, but there are plenty of more affordable options with high completion rates as well. An upper echelon university costing over $50,000 does not lead to a significant improvement in completion rates, so a well-run medium-cost university at $15,000 to $30,000 would be more than adequate.

4. Citations