Introduction:

We use the data from https://www.lock5stat.com/datapage3e.html

I propose the following 10 questions based on my understanding of the data.

  1. What is the average admission rate(AdmitRate) of public vs. private schools?
  2. What is the mean and median of SAT scores among the colleges?
  3. What is the variance and standard deviation of tuition fees among the colleges?
  4. What is the standard deviation of retention rates among colleges?
  5. Create a histogram of tuition fees and identify the shape of the distribution.
  6. Correlation between tuition fees and graduation rates?
  7. What is the average combined SAT score(AvgSAT) for all colleges?
  8. Is there a correlation between the percentage of students who are part-time(PartTime) and the percentage of faculty who are full-time(FullTimeFac)?
  9. How does the net price(NetPrice) vary across colleges in different regions or states?
  10. What is the average faculty salary(FacSalary) at colleges with a completion rate(CompRate) above 80%?

Analysis

We will explore the questions in detail.

college = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")
head(college)
##                                  Name State     ID Main
## 1            Alabama A & M University    AL 100654    1
## 2 University of Alabama at Birmingham    AL 100663    1
## 3                  Amridge University    AL 100690    1
## 4 University of Alabama in Huntsville    AL 100706    1
## 5            Alabama State University    AL 100724    1
## 6           The University of Alabama    AL 100751    1
##                                                                Accred
## 1 Southern Association of Colleges and Schools Commission on Colleges
## 2 Southern Association of Colleges and Schools Commission on Colleges
## 3 Southern Association of Colleges and Schools Commission on Colleges
## 4 Southern Association of Colleges and Schools Commission on Colleges
## 5 Southern Association of Colleges and Schools Commission on Colleges
## 6 Southern Association of Colleges and Schools Commission on Colleges
##   MainDegree HighDegree Control    Region Locale Latitude Longitude AdmitRate
## 1          3          4  Public Southeast   City 34.78337 -86.56850    0.9027
## 2          3          4  Public Southeast   City 33.50570 -86.79935    0.9181
## 3          3          4 Private Southeast   City 32.36261 -86.17401        NA
## 4          3          4  Public Southeast   City 34.72456 -86.64045    0.8123
## 5          3          4  Public Southeast   City 32.36432 -86.29568    0.9787
## 6          3          4  Public Southeast   City 33.21187 -87.54598    0.5330
##   MidACT AvgSAT Online Enrollment White Black Hispanic Asian Other PartTime
## 1     18    929      0       4824   2.5  90.7      0.9   0.2   5.6      6.6
## 2     25   1195      0      12866  57.8  25.9      3.3   5.9   7.1     25.2
## 3     NA     NA      1        322   7.1  14.3      0.6   0.3  77.6     54.4
## 4     28   1322      0       6917  74.2  10.7      4.6   4.0   6.5     15.0
## 5     18    935      0       4189   1.5  93.8      1.0   0.3   3.5      7.7
## 6     28   1278      0      32387  78.5  10.1      4.7   1.2   5.6      7.9
##   NetPrice  Cost TuitionIn TuitonOut TuitionFTE InstructFTE FacSalary
## 1    15184 22886      9857     18236       9227        7298      6983
## 2    17535 24129      8328     19032      11612       17235     10640
## 3     9649 15080      6900      6900      14738        5265      3866
## 4    19986 22108     10280     21480       8727        9748      9391
## 5    12874 19413     11068     19396       9003        7983      7399
## 6    21973 28836     10780     28100      13574       10894     10016
##   FullTimeFac Pell CompRate Debt Female FirstGen MedIncome
## 1        71.3 71.0    23.96 1068   56.4     36.6      23.6
## 2        89.9 35.3    52.92 3755   63.9     34.1      34.5
## 3       100.0 74.2    18.18  109   64.9     51.3      15.0
## 4        64.6 27.7    48.62 1347   47.6     31.0      44.8
## 5        54.2 73.8    27.69 1294   61.3     34.3      22.1
## 6        74.0 18.0    67.87 6430   61.5     22.6      66.7

Q1: What is the average admission rate(AdmitRate) of public vs. private schools?

mean(college$AdmitRate, na.rm=TRUE)
## [1] 0.6702025
barplot(college$AdmitRate)

Q2 What is the mean and median of SAT scores among the colleges?

mean(college$AvgSAT, na.rm=TRUE)
## [1] 1135.25
median(college$AvgSAT, na.rm=TRUE)
## [1] 1121

Q3 What is the variance and standard deviation of tuition fees among the colleges?

var(college$TuitionIn, na.rm=TRUE)
## [1] 199665280
sd(college$TuitionIn, na.rm=TRUE)
## [1] 14130.3

Q4 What is the standard deviation of retention rates among colleges?

sd(college$CompRate, na.rm=TRUE)
## [1] 21.12272

Q5 Create a histogram of tuition fees and identify the shape of the distribution.

hist(college$TuitionIn, main="Histogram of Tuition Fees", xlab="Tuition Fees", col="lightblue", border="black", breaks=20)

Q6 Correlation between tuition fees and graduation rates?

correlation <- cor(college$TuitionIn, college$CompRate, use="complete.obs")
print(paste("Graduation between Tuition Fees and Graduation Rates:", correlation))
## [1] "Graduation between Tuition Fees and Graduation Rates: 0.547703932183766"

Q7 What is the average combined SAT score(AvgSAT) for all colleges?

avg_sat <- mean(college$AvgSAT, na.rm=TRUE)
print(paste("Average Combined SAT Score for ALL Colleges:", avg_sat))
## [1] "Average Combined SAT Score for ALL Colleges: 1135.24980422866"

Q8 Is there a correlation between the percentage of students who are part-time(PartTime) and the percentage of faculty who are full-time(FullTimeFac)?

correlation <- cor(college$PartTime, college$FullTimeFac, use="complete.obs")
print(paste("Correlation between Part-Time Students and Full-Time Faculty:", correlation))
## [1] "Correlation between Part-Time Students and Full-Time Faculty: -0.299398043343412"

Q9 How does the net price(NetPrice) vary across colleges in different regions or states?

boxplot(NetPrice ~ State, data = college, main = "Variation of Net Price Across States:", xlab = "State", ylab = "Net Price($)", col = "lightblue", las = 2, cex.axis = 0.7)

Q10 What is the average faculty salary(FacSalary) at colleges with a completion rate(CompRate) above 80%?

high_completion_data <- subset(college, CompRate>80)
avg_faculty_salary <- mean(high_completion_data$FacSalary, na.rm=TRUE)
print(paste("Average Faculty Salary at Colleges with Completion Rate above 80%?:", avg_faculty_salary))
## [1] "Average Faculty Salary at Colleges with Completion Rate above 80%?: 11182.9473684211"

Summary

It was truly a learning lesson with R, but once you’ve repeated something a couple times, you got it down. I was surprised at how easy everything was once I knew what to do. I will either accept the grade as is, or I would like to try and do a more thorough report. I spent a lot of time teaching myself the ins and outs. I should have reached out sooner for help. I will reach out sooner on next projects.