Background and Synopsis:
As a university student, I have been assigned by my professor to write a statistical report on college data to demonstrate my understanding of Chapter 6: Descriptive Statistics. Using the RStudio IDE, this report will address 10 statistical questions, collaboratively formulated by myself and ChatGPT, utilizing RStudio’s calculation tools and visual aids. Finally, I will analyze each statistical method individually before summarizing my conclusions.
Purpose:
Gain experience in RStudio programming
Reinforce my knowledge of descriptive statistics
Analyze the data provided in a creative and insightful way
Variables:
## Variable
## 1 Name
## 2 State
## 3 ID
## 4 Main
## 5 Accred
## 6 MainDegree
## 7 HighDegree
## 8 Control
## 9 Region
## 10 Locale
## 11 Latitude
## 12 Longitude
## 13 AdmitRate
## 14 CommuteAtlanta
## 15 MidACT
## 16 AvgSAT
## 17 Online
## 18 Enrollment
## 19 White
## 20 Black
## 21 Hispanic
## 22 Asian
## 23 Other
## 24 PartTime
## 25 NetPrice
## 26 Cost
## 27 TuitionIn
## 28 TuitionOut
## 29 TuitionFTE
## 30 InstructFTE
## 31 FacSalary
## 32 FullTimeFac
## 33 Pell
## 34 CompRate
## 35 Debt
## 36 Female
## 37 FirstGen
## 38 MedIncome
## Meaning
## 1 Name of the school
## 2 State where school is located
## 3 ID number for school
## 4 Main campus? (1=yes, 0=branch campus)
## 5 Accreditation agency
## 6 Predominant undergrad degree (3=bachelors)
## 7 Highest degree (0=no degrees, 1=certificate, 2=associate, 3=bachelors, 4=graduate)
## 8 Control of school (Private, Profit, Public)
## 9 Region of country
## 10 Locale (City, Rural, Suburb, Town)
## 11 Latitude
## 12 Longitude
## 13 Admission rate
## 14 Commute to Atlanta
## 15 Median ACT scores
## 16 Average combined SAT scores
## 17 Only online (distance) programs
## 18 Undergraduate enrollment
## 19 Percent of undergraduates who report being white
## 20 Percent of undergraduates who report being black
## 21 Percent of undergraduates who report being Hispanic
## 22 Percent of undergraduates who report being Asian
## 23 Percent of undergraduates who don’t report one of the above
## 24 Percent of undergraduates who are part-time students
## 25 Average net price (cost minus aid)
## 26 Average total cost for tuition, room, board, etc.
## 27 In-state tuition and fees
## 28 Out-of-state tuition and fees
## 29 Net tuition revenue per FTE student
## 30 Instructional spending per FTE student
## 31 Average monthly salary for full-time faculty
## 32 Percent of faculty that are full-time
## 33 Percent of students receiving Pell grants
## 34 Completion rate (percent who finish program within 150% of normal time)
## 35 Average debt for students who complete program
## 36 Percent of female students
## 37 Percent of first-generation students
## 38 Median family income (in $1,000)
Data Collection:
I use the data provided on a sample of 2012 American colleges and universities collected by the Department of Education for the College Scoreboard that I accessed through the Lock5stat website: (https://www.lock5stat.com/index.html).
Data Cleaning:
As instructed by my professor, Dr. Zhang, missing values will be ignored for the sake of simplicity. Outliers and data inconsistencies will not be addressed in this report. However, with the large sample size of 2012 colleges and universities, the skewing effect of outliers and data inconsistencies will be minimized.
Descriptive Stat Methods:
My Questions: (Final 10 questions are bolded)
Based off of my understanding of the data, I came up with the following questions:
ChatGBT Questions: (Final 10 questions are bolded)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
## [1] 34277.31
The average cost of attending university within the data is $34,277.31. Also known as the mean, the average describes the center of the data, found by adding up all costs and dividing by the number of universities. Unlike the median, the average takes outliers into account.
## [1] 30699
The median cost of attending university within the data is $30,699. The median represents the 50th percentile of the data set, which means half the universities cost more and half cost less than $30,699. Unlike the mean, the median is less sensitive to outliers.
## [1] 233433900
The variance of cost to attend university is 233,433,900. Variance is meant to describe the relative spread of the data, but this number, in isolation, is difficult to interpret. Therefore, I will use standard deviation to make better sense of the data.
## [1] 15278.54
The standard deviation of cost to attend university is $15,278.54. Using the mean I calculated above, this means that 68% of universities cost between $18,998.77 and $49,555.85, assuming normal distribution. To find out whether this data set follows a normal distribution, I will create a density plot line and analyze the shape.
## 95%
## 64595.05
The 95th percentile of university cost is $64,595.05, meaning the top 5% of most expensive universities cost $64,595.05 and above. This number is over double the median university cost, $30,699.
## [1] 0.5870019
The correlation between cost and completion rate is 0.5870019. Because the correlation is >0, cost and completion rate have a moderate, but not strong, positive correlation. As costs increase, completion rate tends to increase as well but there are other factors at play.
The histogram above illustrates the distribution of university cost across $5000 increment containers. The frequency of high outliers above $70,000 and low outliers below $10,000 can are shown. The mode of this histogram is represented by the $20,000 to $25,000 bin, where there are roughly 340 universities.
As opposed to the histogram shown above, the density plot of cost uses a smooth line to illustrate a continuous progression of cost. To answer my question from section Q4, this data set does not represent a bell shape curve, as it is rises steeply from $10,000 to $20,000 before leveling out. This graph is positively skewed and shows that the most expensive universities cost over $80,000.
As the side-by-side box-and-whisker plots shown above, cost is related to regional location in the US. The thick black line represents the median, while the lower and upper boundaries of the blue boxes represent the 1st and 3rd quartile boundaries, respectively. The blue box, called the interquartile range, represents 50% of the data set. The “whiskers” represent the max and min ranges of the data. Circles represent outliers, unaccounted for in the box and whisker.
Attending university in the Northeast is most expensive with a median of $40,000 and 3rd quartile of $55,000, followed by the Midwest, West, Southeast, and Territory regions.
In the scatterplot matrix shown above, cost, debt, and completion rate are compared in pairs simultaneously. Cost and debt have approximately zero correlation as when cost increases, debt remains about constant, although debt is highest around $20,000 to $40,000.
Cost and completion rate have a moderately positive correlation rate, which supports my assessment in Q6, as when cost goes up, so does completion rate. However many universities at lower costs have similarly high completion rates, which lowers the intensity of correlation.
Debt and completion rate have essentially no correlation as debt remains constant across completion rates.
In today’s day and age, the average college and university costs $30,000 to $34,000 to attend on average. However, there is large variance with the majority of universities costing from $20,000 to $50,000 per year. The cost of attending university is positively skewed, as a substantial number of universities cost $50,000 to $70,000 per year.
Paying more for university will increase the odds of completing college; however, a significant number of students still complete college despite lower tuition costs. Attending university in the Northeast will cost the most followed closely by universities in the Midwest, Southeast, and West with higher education in territory regions the most financially accessible. Tuition cost does not have a strong correlation with debt after graduation.
In summary, if I were asked by a random high school student on where they should attend college, I would advise that they should not consider an expensive university unless they can comfortably afford it. Otherwise, a medium-cost university at $15,000 to $30,000 should do just fine.