college <- read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv", stringsAsFactors = TRUE)

1. Introduction

Using the data provided by https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv, I propose 10 questions based on my understanding of the data:

2. Analysis

Let’s explore these questions in detail.

Q1: What is the average total cost (tuition + room + board) for four-year colleges?

mean(college$Cost, na.rm = TRUE)
## [1] 34277.31

According to the data, the average total cost for four-year colleges is $34,277.31.

Q2: What is the median admission rate across these colleges?

median(college$AdmitRate, na.rm = TRUE)
## [1] 0.69505

According to the data, the median admission rate across these 4-year colleges is 69.5%.

Q3: How variable are the average combined SAT scores among the colleges?

sd(college$AvgSAT, na.rm = TRUE)
## [1] 128.9077
hist(college$AvgSAT, breaks = 20, main="Histogram of AvgSAT", xlab="AvgSAT")

According to the data, it’s easy to see when put into a histogram that the SAT scores follow a bell-curve pattern, with most scores falling in between 1000-1200 with a standard deviation of 128.9.

Q4: What is the relationship between cost and average SAT score?

plot(college$Cost, college$AvgSAT,
     xlab = "Total Cost ($)", ylab = "AvgSAT",
     main = "Cost vs AvgSAT")

cor(college$Cost, college$AvgSAT, use = "complete.obs")
## [1] 0.5373884

According to the data, there is a 53.7% correlation between cost and average SAT score.

Q5: How does the completion (graduation) rate vary by type of institution (Public vs Private vs Profit)?

boxplot(CompRate ~ Control, data = college,
        xlab = "Institution Control",
        ylab = "Completion Rate",
        main = "Completion Rate by Control")

According to the data, Private and Public institutions have a much higher completion rate than Profit institutions, both hovering around 50% completion rate.

Q6: How does average student debt relate to graduation rates among colleges?

plot(college$Debt, college$CompRate,
     xlab = "Average Student Debt ($)",
     ylab = "Graduation Rate (%)",
     main = "Average Debt vs Graduation Rate")

cor(college$Debt, college$CompRate, use = "complete.obs")
## [1] -0.15836
model <- lm(CompRate ~ Debt, data = college)

abline(model, col = "blue", lwd = 2)

According to the data, the more debt students have, the less their graduation rate.

Q7: What percentage of the colleges have more than, say, 50% of faculty full-time?

prop <- mean(college$FullTimeFac > 50, na.rm = TRUE)
prop
## [1] 0.6933687

According to the data, 69.3% of colleges have more than half of their faculty working full-time.

Q8: Is there a correlation between admission rate and completion rate?

plot(college$AdmitRate, college$CompRate,
     xlab = "Admission Rate",
     ylab = "Completion Rate",
     main = "Admit Rate vs Completion Rate")

cor(college$AdmitRate, college$CompRate, use = "complete.obs")
## [1] -0.3482341

According to the data, the higher the admission rate, the lower the completion rate.

Q9: What is the distribution of average debt for students who complete the program?

hist(college$Debt, breaks = 20, main = "Distribution of Student Debt for Completers",
     xlab = "Average Debt ($)")

According to the data, the distribution for debt is pretty wide, but with the majority of debt being less than $10,000.

Q10: How does the median family income of students differ by region (Midwest, Northeast, Southeast, West, etc.)?

boxplot(MedIncome ~ Region, data = college,
        xlab = "Region",
        ylab = "Median Family Income (in $1,000)",
        main = "Median Family Income by Region")

According to the data, students in the Midwest and Northeast on average have a higher family income.

3. Summary

So, as demonstrated by these 10 questions, there are many assumptions you can make about 4-year colleges, but you’ll find when actually looking at data you can deduce useful answers. It can be very helpful to organize data as well, just like with our examples, to help you fully understand your findings.

Q1: What is the average total cost (tuition + room + board) for four-year colleges?

According to the data found for 4-year colleges, the average cost for a student to attend - including tuition, room, and board - is $34277.31

Q2: What is the median admission rate across these colleges?

According to the data, the median admission rate across these 4-year colleges is 69.5%. So, roughly 70% of all students who applied to a 4-year college were accepted, leaving around 30% of the students unaccepted.

Q3: How variable are the average combined SAT scores among the colleges?

According to the data, the standard deviation between SAT scores is 128.9. However, when you put the data into the form of a histogram, it becomes much easier to visualize. Most SAT scores fall in between the range of 1000 to 1200, but the histogram overall follows the standard pattern of a bell-curve.

Q4: What is the relationship between cost and average SAT score?

According to the data, there is a 53.7% correlation between cost and average SAT score. This is displayed very well in the scatter-plot, showing that as the cost for college goes up, the average SAT score does as well.

Q5: How does the completion (graduation) rate vary by type of institution (Public vs Private vs Profit)?

According to the data, Private and Public institutions have a much higher completion rate than Profit institutions, both hovering around 50% completion rate, while Profit institutions are at roughly 25%, which is an insane statistic.

Q6: How does average student debt relate to graduation rates among colleges?

According to the data, the more debt students have, the less their graduation rate. I think that this statistic is pretty interest, because it goes to show you that you don’t need to go into tons of debt to graduate college, and that most people graduate with little to no debt.

Q7: What percentage of the colleges have more than, say, 50% of faculty full-time?

According to the data, 69.3% of colleges have more than half of their faculty working full-time. This statistic doesn’t shock me. Roughly 70% of workers being full-time is pretty normal.

Q8: Is there a correlation between admission rate and completion rate?

According to the data, the higher the admission rate, the lower the completion rate. This statistic doesn’t shock me either; colleges where more people are excepted, where GPA matters a little less, tend to have more dropouts and people who fail.

Q9: What is the distribution of average debt for students who complete the program?

According to the data, the distribution for debt is pretty wide, but with the majority of debt being less than $10,000. Again, this goes to show that just to go to college, you don’t have to go into absurd amounts of debt.

Q10: How does the median family income of students differ by region (Midwest, Northeast, Southeast, West, etc.)?

According to the data, students in the Midwest and Northeast on average have a higher family income, however most places in the U.S. are fairly even, except for the Territories, which have a far lower average family income.

4. Appendix

#Q1 code:
mean(college$Cost, na.rm = TRUE)

#Q2 code:
median(college$AdmitRate, na.rm = TRUE)

#Q3 code:
sd(college$AvgSAT, na.rm = TRUE)
hist(college$AvgSAT, breaks = 20, main="Histogram of AvgSAT", xlab="AvgSAT")

#Q4 code:
plot(college$Cost, college$AvgSAT,
     xlab = "Total Cost ($)", ylab = "AvgSAT",
     main = "Cost vs AvgSAT")
cor(college$Cost, college$AvgSAT, use = "complete.obs")

#Q5 code:
boxplot(CompRate ~ Control, data = college,
        xlab = "Institution Control",
        ylab = "Completion Rate",
        main = "Completion Rate by Control")

#Q6 code:
plot(college$Debt, college$CompRate,
     xlab = "Average Student Debt ($)",
     ylab = "Graduation Rate (%)",
     main = "Average Debt vs Graduation Rate")

cor(college$Debt, college$CompRate, use = "complete.obs")

model <- lm(CompRate ~ Debt, data = college)

abline(model, col = "blue", lwd = 2)

#Q7 code:
prop <- mean(college$FullTimeFac > 50, na.rm = TRUE)
prop

#Q8 code:
plot(college$AdmitRate, college$CompRate,
     xlab = "Admission Rate",
     ylab = "Completion Rate",
     main = "Admit Rate vs Completion Rate")
cor(college$AdmitRate, college$CompRate, use = "complete.obs")

#Q9 code:
hist(college$Debt, breaks = 20, main = "Distribution of Student Debt for Completers",
     xlab = "Average Debt ($)")

#Q10 code:
boxplot(MedIncome ~ Region, data = college,
        xlab = "Region",
        ylab = "Median Family Income (in $1,000)",
        main = "Median Family Income by Region")