This report utilizes the IPEDS data center by querying colleges that are degree earning, in Minnesota, and are 4 year + focusing on the number of students receiving bachelor degrees and the number of students enrolled.

A simple linear regression is fitted to the data for the number of students receiving bachelors degrees by the number of students enrolled.

Load and clean the data:

NumBach <- read.csv("C:/Users/Konzert/Documents/IPEDS/RecBach.csv", header = T)
colnames(NumBach)[4] <- "ReceivingBachelorDegree"
NumBach <- NumBach[,-1]; 

Enr <- read.csv("C:/Users/Konzert/Documents/IPEDS/Enrollment.csv", header = T)
colnames(Enr)[4] <- "Enrolled"

Com <- merge(NumBach, Enr, by="institution.name")
fit <-lm(Com$Enrolled ~ Com$ReceivingBachelorDegree)

About 94% of the variation in the number of students receiving bachelor degrees can explained by the number of students enrolled:

cor(Com$Enrolled, Com$ReceivingBachelorDegree)
## [1] 0.9352774

Data Visualization:

p <- qplot(Enrolled, ReceivingBachelorDegree, colour=institution.name, data = Com)
p = p + geom_smooth(aes(group = 1), method="lm", fullrange=TRUE)
p = p + ggtitle("Number of Students Receiving Bachelor Degrees \ 
               by Number of Students Enrolled")
p