Data Exploration
urlfile <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ISLR/College.csv"
mydataset<-read.csv(urlfile)
head(mydataset, 20)
## X Private Apps Accept Enroll
## 1 Abilene Christian University Yes 1660 1232 721
## 2 Adelphi University Yes 2186 1924 512
## 3 Adrian College Yes 1428 1097 336
## 4 Agnes Scott College Yes 417 349 137
## 5 Alaska Pacific University Yes 193 146 55
## 6 Albertson College Yes 587 479 158
## 7 Albertus Magnus College Yes 353 340 103
## 8 Albion College Yes 1899 1720 489
## 9 Albright College Yes 1038 839 227
## 10 Alderson-Broaddus College Yes 582 498 172
## 11 Alfred University Yes 1732 1425 472
## 12 Allegheny College Yes 2652 1900 484
## 13 Allentown Coll. of St. Francis de Sales Yes 1179 780 290
## 14 Alma College Yes 1267 1080 385
## 15 Alverno College Yes 494 313 157
## 16 American International College Yes 1420 1093 220
## 17 Amherst College Yes 4302 992 418
## 18 Anderson University Yes 1216 908 423
## 19 Andrews University Yes 1130 704 322
## 20 Angelo State University No 3540 2001 1016
## Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books
## 1 23 52 2885 537 7440 3300 450
## 2 16 29 2683 1227 12280 6450 750
## 3 22 50 1036 99 11250 3750 400
## 4 60 89 510 63 12960 5450 450
## 5 16 44 249 869 7560 4120 800
## 6 38 62 678 41 13500 3335 500
## 7 17 45 416 230 13290 5720 500
## 8 37 68 1594 32 13868 4826 450
## 9 30 63 973 306 15595 4400 300
## 10 21 44 799 78 10468 3380 660
## 11 37 75 1830 110 16548 5406 500
## 12 44 77 1707 44 17080 4440 400
## 13 38 64 1130 638 9690 4785 600
## 14 44 73 1306 28 12572 4552 400
## 15 23 46 1317 1235 8352 3640 650
## 16 9 22 1018 287 8700 4780 450
## 17 83 96 1593 5 19760 5300 660
## 18 19 40 1819 281 10100 3520 550
## 19 14 23 1586 326 9996 3090 900
## 20 24 54 4190 1512 5130 3592 500
## Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
## 1 2200 70 78 18.1 12 7041 60
## 2 1500 29 30 12.2 16 10527 56
## 3 1165 53 66 12.9 30 8735 54
## 4 875 92 97 7.7 37 19016 59
## 5 1500 76 72 11.9 2 10922 15
## 6 675 67 73 9.4 11 9727 55
## 7 1500 90 93 11.5 26 8861 63
## 8 850 89 100 13.7 37 11487 73
## 9 500 79 84 11.3 23 11644 80
## 10 1800 40 41 11.5 15 8991 52
## 11 600 82 88 11.3 31 10932 73
## 12 600 73 91 9.9 41 11711 76
## 13 1000 60 84 13.3 21 7940 74
## 14 400 79 87 15.3 32 9305 68
## 15 2449 36 69 11.1 26 8127 55
## 16 1400 78 84 14.7 19 7355 69
## 17 1598 93 98 8.4 63 21424 100
## 18 1100 48 61 12.1 14 7994 59
## 19 1320 62 66 11.5 18 10908 46
## 20 2000 60 62 23.1 5 4010 34
summary(mydataset)
## X Private Apps
## Abilene Christian University: 1 No :212 Min. : 81
## Adelphi University : 1 Yes:565 1st Qu.: 776
## Adrian College : 1 Median : 1558
## Agnes Scott College : 1 Mean : 3002
## Alaska Pacific University : 1 3rd Qu.: 3624
## Albertson College : 1 Max. :48094
## (Other) :771
## Accept Enroll Top10perc Top25perc
## Min. : 72 Min. : 35 Min. : 1.00 Min. : 9.0
## 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00 1st Qu.: 41.0
## Median : 1110 Median : 434 Median :23.00 Median : 54.0
## Mean : 2019 Mean : 780 Mean :27.56 Mean : 55.8
## 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00 3rd Qu.: 69.0
## Max. :26330 Max. :6392 Max. :96.00 Max. :100.0
##
## F.Undergrad P.Undergrad Outstate Room.Board
## Min. : 139 Min. : 1.0 Min. : 2340 Min. :1780
## 1st Qu.: 992 1st Qu.: 95.0 1st Qu.: 7320 1st Qu.:3597
## Median : 1707 Median : 353.0 Median : 9990 Median :4200
## Mean : 3700 Mean : 855.3 Mean :10441 Mean :4358
## 3rd Qu.: 4005 3rd Qu.: 967.0 3rd Qu.:12925 3rd Qu.:5050
## Max. :31643 Max. :21836.0 Max. :21700 Max. :8124
##
## Books Personal PhD Terminal
## Min. : 96.0 Min. : 250 Min. : 8.00 Min. : 24.0
## 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00 1st Qu.: 71.0
## Median : 500.0 Median :1200 Median : 75.00 Median : 82.0
## Mean : 549.4 Mean :1341 Mean : 72.66 Mean : 79.7
## 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00 3rd Qu.: 92.0
## Max. :2340.0 Max. :6800 Max. :103.00 Max. :100.0
##
## S.F.Ratio perc.alumni Expend Grad.Rate
## Min. : 2.50 Min. : 0.00 Min. : 3186 Min. : 10.00
## 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751 1st Qu.: 53.00
## Median :13.60 Median :21.00 Median : 8377 Median : 65.00
## Mean :14.09 Mean :22.74 Mean : 9660 Mean : 65.46
## 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830 3rd Qu.: 78.00
## Max. :39.80 Max. :64.00 Max. :56233 Max. :118.00
##
table(mydataset$Private)
##
## No Yes
## 212 565
Above summary is on the U.S. News and World Report’s College Data. This data set primarily consists of Statistics for a large number of US Colleges from the 1995 issue of US News and World Report. There are 777 observations for 18 variables. This data set has 565 private colleges and 212 Non-private colleges.Mean of the Application is 3002 where as mean of the actual acceptance is somewhat lower at 2019. Surprisingly, the enrollement mean (780) is way lower than the Acceptance mean.Main reason why I chose these dataset because as a higher education officer I can see the relationship between these variables.
Data Wrangling
# # Creating a subset out of the original table
newframe <- c(mydataset[ mydataset$Private == "No", c(1,3,4,5,15,19)])
newmydataset <- (data.frame(newframe))
head(newmydataset, 20)
## X Apps Accept Enroll
## 1 Angelo State University 3540 2001 1016
## 2 Appalachian State University 7313 4664 1910
## 3 Arizona State University Main campus 12809 10308 3761
## 4 Arkansas Tech University 1734 1729 951
## 5 Auburn University-Main Campus 7548 6791 3070
## 6 Bemidji State University 1208 877 546
## 7 Bloomsburg Univ. of Pennsylvania 6773 3028 1025
## 8 Bowling Green State University 9251 7333 3076
## 9 California Polytechnic-San Luis 7811 3817 1650
## 10 California State University at Fresno 4540 3294 1483
## 11 Castleton State College 1257 940 363
## 12 Central Connecticut State University 4158 2532 902
## 13 Central Missouri State University 4681 4101 1436
## 14 Central Washington University 2785 2011 1007
## 15 Christopher Newport University 883 766 428
## 16 Clemson University 8065 5257 2301
## 17 Clinch Valley Coll. of the Univ. of Virginia 689 561 250
## 18 College of Charleston 4772 3140 1265
## 19 College of William and Mary 7117 3106 1217
## 20 Colorado State University 9478 6312 2194
## Terminal Grad.Rate
## 1 62 34
## 2 96 70
## 3 93 48
## 4 60 48
## 5 91 69
## 6 62 46
## 7 68 75
## 8 89 67
## 9 81 59
## 10 90 61
## 11 91 79
## 12 73 49
## 13 80 50
## 14 89 51
## 15 82 48
## 16 88 73
## 17 67 46
## 18 78 51
## 19 92 93
## 20 89 59
New dataset includes only Non-Private Colleges (212) with some important variables such Applications, Acceptance, Enrollement, Terminals and Graduation Rate.
# rename columns
names(newmydataset)<-c("College Name", "Applications", "Acceptence", "Enrollment", "Terminations", "Graduate Rate")
head(newmydataset, 20)
## College Name Applications Acceptence
## 1 Angelo State University 3540 2001
## 2 Appalachian State University 7313 4664
## 3 Arizona State University Main campus 12809 10308
## 4 Arkansas Tech University 1734 1729
## 5 Auburn University-Main Campus 7548 6791
## 6 Bemidji State University 1208 877
## 7 Bloomsburg Univ. of Pennsylvania 6773 3028
## 8 Bowling Green State University 9251 7333
## 9 California Polytechnic-San Luis 7811 3817
## 10 California State University at Fresno 4540 3294
## 11 Castleton State College 1257 940
## 12 Central Connecticut State University 4158 2532
## 13 Central Missouri State University 4681 4101
## 14 Central Washington University 2785 2011
## 15 Christopher Newport University 883 766
## 16 Clemson University 8065 5257
## 17 Clinch Valley Coll. of the Univ. of Virginia 689 561
## 18 College of Charleston 4772 3140
## 19 College of William and Mary 7117 3106
## 20 Colorado State University 9478 6312
## Enrollment Terminations Graduate Rate
## 1 1016 62 34
## 2 1910 96 70
## 3 3761 93 48
## 4 951 60 48
## 5 3070 91 69
## 6 546 62 46
## 7 1025 68 75
## 8 3076 89 67
## 9 1650 81 59
## 10 1483 90 61
## 11 363 91 79
## 12 902 73 49
## 13 1436 80 50
## 14 1007 89 51
## 15 428 82 48
## 16 2301 88 73
## 17 250 67 46
## 18 1265 78 51
## 19 1217 92 93
## 20 2194 89 59
Calculating the summary for the new dataset
summary(newmydataset)
## College Name Applications
## Angelo State University : 1 Min. : 233
## Appalachian State University : 1 1st Qu.: 2191
## Arizona State University Main campus: 1 Median : 4307
## Arkansas Tech University : 1 Mean : 5730
## Auburn University-Main Campus : 1 3rd Qu.: 7722
## Bemidji State University : 1 Max. :48094
## (Other) :206
## Acceptence Enrollment Terminations Graduate Rate
## Min. : 233 Min. : 153.0 Min. : 33.00 Min. : 10.00
## 1st Qu.: 1563 1st Qu.: 701.8 1st Qu.: 76.00 1st Qu.: 46.00
## Median : 2930 Median :1337.5 Median : 86.00 Median : 55.00
## Mean : 3919 Mean :1640.9 Mean : 82.82 Mean : 56.04
## 3rd Qu.: 5264 3rd Qu.:2243.8 3rd Qu.: 92.00 3rd Qu.: 65.00
## Max. :26330 Max. :6392.0 Max. :100.00 Max. :100.00
##
Some more calculations using new dataset
# mean of applicaitons
mean(newmydataset$Applications)
## [1] 5729.92
# mean of Graduation Rate
mean(newmydataset$`Graduate Rate`)
## [1] 56.04245
# Non-Private colleges with Graduation Rate higher than 90
newsub <- subset(newmydataset, newmydataset$`Graduate Rate` > 90)
newsub
## College Name Applications Acceptence Enrollment
## 19 College of William and Mary 7117 3106 1217
## 42 James Madison University 11223 5285 2082
## 65 Missouri Southern State College 1576 1326 913
## 186 University of Virginia 15849 5384 2678
## Terminations Graduate Rate
## 19 92 93
## 42 81 98
## 65 54 100
## 186 92 95
Graphics
plot(mydataset$F.Undergrad, mydataset$Room.Board, xlab = 'Full Time Grads', ylab = 'Room n. Board', main = 'Number of Full Time Grads Vs. Room N. Board Expenses',col = 'red')

hist<- hist(newmydataset$Enroll, freq = TRUE, xlab = 'Applications Range', ylab = "Enrollement", main = 'Applications Vs. Enrollement', col = 'lightgreen')

hist
## $breaks
## [1] 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500
##
## $counts
## [1] 37 42 37 35 18 9 18 6 1 3 1 3 2
##
## $density
## [1] 3.490566e-04 3.962264e-04 3.490566e-04 3.301887e-04 1.698113e-04
## [6] 8.490566e-05 1.698113e-04 5.660377e-05 9.433962e-06 2.830189e-05
## [11] 9.433962e-06 2.830189e-05 1.886792e-05
##
## $mids
## [1] 250 750 1250 1750 2250 2750 3250 3750 4250 4750 5250 5750 6250
##
## $xname
## [1] "newmydataset$Enroll"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
# There are 4 Non-private colleges with a grad rate of 90 and higher. This box plot gives us a visual comparison of its Recruitment.
boxplot(newsub$Applications, newsub$Acceptence, newsub$Enrollment, newsub$Terminations, main = 'Non-Private colleges with Graduation Rate higher than 90', col = c("green", "purple", "yellow", "blue"),names = c("Applications","Acceptence","Enrollment","Terminations"), ylab = "Number of Students")

Data Visualization
# Applications Vs. Enrollment
library(ggplot2)
ggplot(data=newmydataset, aes(x=newmydataset$Applications, y=newmydataset$Enrollment))+geom_point(aes(newmydataset$Applications))+geom_smooth(method = 'lm')+coord_cartesian()+scale_color_gradient()

library(ggplot2)
library(rlang)
## Warning: package 'rlang' was built under R version 3.6.1
newframe2 <- c(mydataset[ mydataset$Private == "Yes", c(1,3,4,5,15,19)])
newmydataset2 <- (data.frame(newframe2))
ggplot(data = newmydataset2,aes(x=newmydataset2$Enroll, y=newmydataset2$Grad.Rate) )+geom_point(aes(newmydataset2$Enroll,newmydataset2$Grad.Rate))+stat_density2d()

Meaningful question for analysis
Compare two datasets of Private and Non-Private colleges for their summary. Which one has the highest Graduation Rate?
# Summary of Non-Private Colleges
summary(newmydataset)
## College Name Applications
## Angelo State University : 1 Min. : 233
## Appalachian State University : 1 1st Qu.: 2191
## Arizona State University Main campus: 1 Median : 4307
## Arkansas Tech University : 1 Mean : 5730
## Auburn University-Main Campus : 1 3rd Qu.: 7722
## Bemidji State University : 1 Max. :48094
## (Other) :206
## Acceptence Enrollment Terminations Graduate Rate
## Min. : 233 Min. : 153.0 Min. : 33.00 Min. : 10.00
## 1st Qu.: 1563 1st Qu.: 701.8 1st Qu.: 76.00 1st Qu.: 46.00
## Median : 2930 Median :1337.5 Median : 86.00 Median : 55.00
## Mean : 3919 Mean :1640.9 Mean : 82.82 Mean : 56.04
## 3rd Qu.: 5264 3rd Qu.:2243.8 3rd Qu.: 92.00 3rd Qu.: 65.00
## Max. :26330 Max. :6392.0 Max. :100.00 Max. :100.00
##
# Summary of private colleges
summary(newmydataset2)
## X Apps Accept
## Abilene Christian University: 1 Min. : 81 Min. : 72
## Adelphi University : 1 1st Qu.: 619 1st Qu.: 501
## Adrian College : 1 Median : 1133 Median : 859
## Agnes Scott College : 1 Mean : 1978 Mean : 1306
## Alaska Pacific University : 1 3rd Qu.: 2186 3rd Qu.: 1580
## Albertson College : 1 Max. :20192 Max. :13007
## (Other) :559
## Enroll Terminal Grad.Rate
## Min. : 35.0 Min. : 24.00 Min. : 15
## 1st Qu.: 206.0 1st Qu.: 68.00 1st Qu.: 58
## Median : 328.0 Median : 81.00 Median : 69
## Mean : 456.9 Mean : 78.53 Mean : 69
## 3rd Qu.: 520.0 3rd Qu.: 92.00 3rd Qu.: 81
## Max. :4615.0 Max. :100.00 Max. :118
##
Answer: Mean Graduation Rate of NOn-Private Colleges is 56 where as it is 69 for private colleges. this shows that the Graduation Rate is higher in private colleges compared to Non-private colleges.We need keep in mind that we are looking at a big dataset in Private colleges compared to Non-Private.