Introduction

In this assignment, we will use various statistical methods to examine data from the CollegeScores4yr data set.

Questions

  1. What is the average cost of college?
  2. What is the correlation of faculty salary to tuition?
  3. What is the variance of debt?
  4. What is the average ACT score for private and public colleges?
  5. What is the standard deviation of female students?
  6. What is the average admission rate across all schools? (Mean, Histogram)
  7. How much variation is there in faculty salaries across schools?
  8. What is the standard deviation between highest degree types?
  9. What is the percentage breakdown of schools by control type (Private, Profit, Public)? (Pie Chart) 10.What is the correlation between net price and in-state tuition? (Correlation, Scatterplot)

Analysis

We will now explore the above questions in detail.

Q1: What is the average cost of college?

The average cost of college is $34,277.31

Q2: What is the correlation of faculty salary to tuition?

The correlation of of faculty salary to tuition is 0.424201

Q3: What is the variance of debt?

The variance of college debt is 28740171

Q4:What is the average ACT score for private and public colleges?

The mean ACT score from public colleges is 1118.91 The mean ACT score from private colleges is 1145.839

Q5:What is the standard deviation of female students?

The standard deviation of female students is 12.34421

Q6:What is the average admission rate across public and private schools?

The public admission rate on average is 0.7008028 The private college admission rate on average is 0.6499581 The profit college admission rate on average is 0.7448919

Q7:How much variation is there in faculty salaries across schools?

Across schools, faculty salary varies by 6568988

Q8:What is the standard deviation between highest degree types?

The standard deviation between highest degree offered is 0.4724913

Q9:What is the percentage breakdown of schools by control type

Private colleges make up 61.% of all colleges within the database. Public colleges make up 29.8% of all colleges within the database. For-profit colleges make up 8.4% of all colleges within the database.

Q10:What is the correlation between net price and in-state tuition?

The correlation between net-price and in-state tuition is 0.7371491.

Appendix

#Loading in the database:
db = read.csv("https://www.lock5stat.com/datasets3e/CollegeScores4yr.csv")

#Question 1 code:
mean(db$Cost, na.rm = TRUE)
## [1] 34277.31
#Question 2 code:
cor(db$FacSalary, db$Cost, use = "complete.obs")
## [1] 0.424201
#Question 3 code:
var(db$Debt, na.rm = TRUE)
## [1] 28740171
#Question 4 code:
publicColleges <- filter(db, Control == "Public")
privateColleges <- filter(db, Control == "Private")

meanOfPublicACT <- mean(publicColleges$AvgSAT, na.rm = TRUE)
meanOfPublicACT
## [1] 1118.91
meanOfPrivateACT <- mean(privateColleges$AvgSAT, na.rm = TRUE)
meanOfPrivateACT
## [1] 1145.839
#Question 5 code:
sd(db$Female, na.rm = TRUE)
## [1] 12.34421
#Question 6 code:
publicColleges <- filter(db, Control == "Public")
privateColleges <- filter(db, Control == "Private")
profitColleges <- filter(db, Control == "Profit")

publicAdmission <- mean(publicColleges$AdmitRate, na.rm = TRUE)
privateAdmission <- mean(privateColleges$AdmitRate, na.rm = TRUE)
profitAdmission <- mean(profitColleges$AdmitRate, na.rm = TRUE)

publicAdmission
## [1] 0.7008028
privateAdmission
## [1] 0.6499581
profitAdmission
## [1] 0.7448919
barplot(c(publicAdmission,privateAdmission,profitAdmission),names.arg = c("Public Colleges", "Private Colleges", "Profit Colleges"), xlab = "College Type", ylab = "Average Admission Rate", main = "Average Admission Rate Between College Types")

#Question 7 code:
var(db$FacSalary, na.rm = TRUE)
## [1] 6568988
#Question 8 code:
sd(db$HighDegree, na.rm = TRUE)
## [1] 0.4724913
#Question 9 code:
publicColleges <- nrow(filter(db, Control == "Public"))
privateColleges <- nrow(filter(db, Control == "Private"))
profitColleges <- nrow(filter(db, Control == "Profit"))

collegeTypes <- c(publicColleges, privateColleges, profitColleges)
names <- c("Public Colleges", "Private Colleges", "Profit Colleges")

percentages <- round((collegeTypes / sum(collegeTypes)) * 100, 1)

labels <- paste(names, percentages, "%")

pie(collegeTypes, labels = labels, main = "Distribution of College Types")

#Question 10 code:
cor(db$NetPrice,db$TuitionIn, use = "complete.obs")
## [1] 0.7371491