We will use the data from https://www.lock5stat.com/datapage3e.html to look at 4-year college university statistics
From this data, here are 10 hypothetical questions I will ask about this data:
1.) What is the mean enrollment of all the colleges in the data set? 2.) What is the median cost of living amongst all the colleges in the data set? 3.) What is the standard deviation of mean ACT scores amongst the colleges? 4.) What is the correlation between average SAT score and cost of living amongst all the colleges in the data set? 5.) What is the minimum admission rate amongst all the colleges in the data set? 6.) What is the maximum average net price amongst all the colleges in the data set? 7.) What is the distribution of average monthly salary for full-time faculty amongst all the colleges in the data set? 8.) What is the correlation between the average debt for students who complete program and the average total cost for tuition amongst all the colleges in the data set? 9.) What is the mean percent of faculty that are full-time amongst all the colleges in the data set? 10.) What is the range of percent of undergraduates who report being white?
Let’s explore these questions in detail.
college = read.csv('https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv')
head(college)
## Name State ID Main
## 1 Alabama A & M University AL 100654 1
## 2 University of Alabama at Birmingham AL 100663 1
## 3 Amridge University AL 100690 1
## 4 University of Alabama in Huntsville AL 100706 1
## 5 Alabama State University AL 100724 1
## 6 The University of Alabama AL 100751 1
## Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
## MainDegree HighDegree Control Region Locale Latitude Longitude AdmitRate
## 1 3 4 Public Southeast City 34.78337 -86.56850 0.9027
## 2 3 4 Public Southeast City 33.50570 -86.79935 0.9181
## 3 3 4 Private Southeast City 32.36261 -86.17401 NA
## 4 3 4 Public Southeast City 34.72456 -86.64045 0.8123
## 5 3 4 Public Southeast City 32.36432 -86.29568 0.9787
## 6 3 4 Public Southeast City 33.21187 -87.54598 0.5330
## MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1 18 929 0 4824 2.5 90.7 0.9 0.2 5.6 6.6
## 2 25 1195 0 12866 57.8 25.9 3.3 5.9 7.1 25.2
## 3 NA NA 1 322 7.1 14.3 0.6 0.3 77.6 54.4
## 4 28 1322 0 6917 74.2 10.7 4.6 4.0 6.5 15.0
## 5 18 935 0 4189 1.5 93.8 1.0 0.3 3.5 7.7
## 6 28 1278 0 32387 78.5 10.1 4.7 1.2 5.6 7.9
## NetPrice Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1 15184 22886 9857 18236 9227 7298 6983
## 2 17535 24129 8328 19032 11612 17235 10640
## 3 9649 15080 6900 6900 14738 5265 3866
## 4 19986 22108 10280 21480 8727 9748 9391
## 5 12874 19413 11068 19396 9003 7983 7399
## 6 21973 28836 10780 28100 13574 10894 10016
## FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1 71.3 71.0 23.96 1068 56.4 36.6 23.6
## 2 89.9 35.3 52.92 3755 63.9 34.1 34.5
## 3 100.0 74.2 18.18 109 64.9 51.3 15.0
## 4 64.6 27.7 48.62 1347 47.6 31.0 44.8
## 5 54.2 73.8 27.69 1294 61.3 34.3 22.1
## 6 74.0 18.0 67.87 6430 61.5 22.6 66.7
mean(college$Enrollment, na.rm=TRUE)
## [1] 4484.831
The mean enrollment of all the colleges in the data set is 4,484.831.
median(college$Cost, na.rm = TRUE)
## [1] 30699
The median cost of living amongst all the colleges in the data set is $30,699.
sd(college$MidACT, na.rm = TRUE)
## [1] 3.653612
The standar deviation of mean ACT scores amongst the colleges is 3.653612 points.
cor(college$AvgSAT, college$Cost, use = 'complete.obs')
## [1] 0.5373884
The correlation between average SAT score and cost of living amongst all the colleges in the data set is 0.53.
min(college$TuitionFTE, na.rm = TRUE)
## [1] 0
The minumum net tuition revenue per FTE student amongst all the colleges in the data set is $0.
max(college$NetPrice, na.rm = TRUE)
## [1] 55775
The maximum average net price amongst all the colleges in the data set is $55,775.
hist(college$FacSalary, main = 'Histogram Of Average Monthly Salary for Full-Time Faculty', xlab = "Cost", col = "yellow")
This shows the distribution of average monthly salary for full-time
faculty amongst all the colleges in the data set.
cor(college$Cost, college$Debt, use = 'complete.obs')
## [1] -0.2144525
The correlation between the average debt for students who complete program and the average total cost for tuition amongst all the colleges in the data set is -0.2144525.
mean(college$FullTimeFac, na.rm = TRUE)
## [1] 64.8313
The mean percent of faculty that are full-time amongst all the colleges in the data set is 64.8313 staff.
range(college$White, na.rm = TRUE)
## [1] 0 100
The range of percent of undergraduates who report being white is from 0% to 100%.
Q1.) 4,484.831 Q2.) $30,699 Q3.) 3.653612 points Q4.) 0.5373884 Q5.) 0 Q6.) $55,775 Q7.) see graph Q8.) -0.2144525 Q9.) 64.8313 staff Q10.) 0% to 100%