This report utilizes the IPEDS data center by querying colleges that are degree earning, in Minnesota, and are 4 year + focusing on the number of students receiving bachelor degrees and the number of students enrolled.
A simple linear regression is fitted to the data for the number of students receiving bachelors degrees by the number of students enrolled.
Load and clean the data:
NumBach <- read.csv("C:/Users/Konzert/Documents/IPEDS/RecBach.csv", header = T)
colnames(NumBach)[4] <- "ReceivingBachelorDegree"
NumBach <- NumBach[,-1];
Enr <- read.csv("C:/Users/Konzert/Documents/IPEDS/Enrollment.csv", header = T)
colnames(Enr)[4] <- "Enrolled"
Com <- merge(NumBach, Enr, by="institution.name")
fit <-lm(Com$Enrolled ~ Com$ReceivingBachelorDegree)
About 94% of the variation in the number of students receiving bachelor degrees can explained by the number of students enrolled:
cor(Com$Enrolled, Com$ReceivingBachelorDegree)
## [1] 0.9352774
Data Visualization:
p <- qplot(Enrolled, ReceivingBachelorDegree, colour=institution.name, data = Com)
p = p + geom_smooth(aes(group = 1), method="lm", fullrange=TRUE)
p = p + ggtitle("Number of Students Receiving Bachelor Degrees \
by Number of Students Enrolled")
p