Data Exploration

urlfile <- "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ISLR/College.csv"
mydataset<-read.csv(urlfile)
head(mydataset, 20)
##                                          X Private Apps Accept Enroll
## 1             Abilene Christian University     Yes 1660   1232    721
## 2                       Adelphi University     Yes 2186   1924    512
## 3                           Adrian College     Yes 1428   1097    336
## 4                      Agnes Scott College     Yes  417    349    137
## 5                Alaska Pacific University     Yes  193    146     55
## 6                        Albertson College     Yes  587    479    158
## 7                  Albertus Magnus College     Yes  353    340    103
## 8                           Albion College     Yes 1899   1720    489
## 9                         Albright College     Yes 1038    839    227
## 10               Alderson-Broaddus College     Yes  582    498    172
## 11                       Alfred University     Yes 1732   1425    472
## 12                       Allegheny College     Yes 2652   1900    484
## 13 Allentown Coll. of St. Francis de Sales     Yes 1179    780    290
## 14                            Alma College     Yes 1267   1080    385
## 15                         Alverno College     Yes  494    313    157
## 16          American International College     Yes 1420   1093    220
## 17                         Amherst College     Yes 4302    992    418
## 18                     Anderson University     Yes 1216    908    423
## 19                      Andrews University     Yes 1130    704    322
## 20                 Angelo State University      No 3540   2001   1016
##    Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books
## 1         23        52        2885         537     7440       3300   450
## 2         16        29        2683        1227    12280       6450   750
## 3         22        50        1036          99    11250       3750   400
## 4         60        89         510          63    12960       5450   450
## 5         16        44         249         869     7560       4120   800
## 6         38        62         678          41    13500       3335   500
## 7         17        45         416         230    13290       5720   500
## 8         37        68        1594          32    13868       4826   450
## 9         30        63         973         306    15595       4400   300
## 10        21        44         799          78    10468       3380   660
## 11        37        75        1830         110    16548       5406   500
## 12        44        77        1707          44    17080       4440   400
## 13        38        64        1130         638     9690       4785   600
## 14        44        73        1306          28    12572       4552   400
## 15        23        46        1317        1235     8352       3640   650
## 16         9        22        1018         287     8700       4780   450
## 17        83        96        1593           5    19760       5300   660
## 18        19        40        1819         281    10100       3520   550
## 19        14        23        1586         326     9996       3090   900
## 20        24        54        4190        1512     5130       3592   500
##    Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
## 1      2200  70       78      18.1          12   7041        60
## 2      1500  29       30      12.2          16  10527        56
## 3      1165  53       66      12.9          30   8735        54
## 4       875  92       97       7.7          37  19016        59
## 5      1500  76       72      11.9           2  10922        15
## 6       675  67       73       9.4          11   9727        55
## 7      1500  90       93      11.5          26   8861        63
## 8       850  89      100      13.7          37  11487        73
## 9       500  79       84      11.3          23  11644        80
## 10     1800  40       41      11.5          15   8991        52
## 11      600  82       88      11.3          31  10932        73
## 12      600  73       91       9.9          41  11711        76
## 13     1000  60       84      13.3          21   7940        74
## 14      400  79       87      15.3          32   9305        68
## 15     2449  36       69      11.1          26   8127        55
## 16     1400  78       84      14.7          19   7355        69
## 17     1598  93       98       8.4          63  21424       100
## 18     1100  48       61      12.1          14   7994        59
## 19     1320  62       66      11.5          18  10908        46
## 20     2000  60       62      23.1           5   4010        34
summary(mydataset)
##                             X       Private        Apps      
##  Abilene Christian University:  1   No :212   Min.   :   81  
##  Adelphi University          :  1   Yes:565   1st Qu.:  776  
##  Adrian College              :  1             Median : 1558  
##  Agnes Scott College         :  1             Mean   : 3002  
##  Alaska Pacific University   :  1             3rd Qu.: 3624  
##  Albertson College           :  1             Max.   :48094  
##  (Other)                     :771                            
##      Accept          Enroll       Top10perc       Top25perc    
##  Min.   :   72   Min.   :  35   Min.   : 1.00   Min.   :  9.0  
##  1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00   1st Qu.: 41.0  
##  Median : 1110   Median : 434   Median :23.00   Median : 54.0  
##  Mean   : 2019   Mean   : 780   Mean   :27.56   Mean   : 55.8  
##  3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00   3rd Qu.: 69.0  
##  Max.   :26330   Max.   :6392   Max.   :96.00   Max.   :100.0  
##                                                                
##   F.Undergrad     P.Undergrad         Outstate       Room.Board  
##  Min.   :  139   Min.   :    1.0   Min.   : 2340   Min.   :1780  
##  1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320   1st Qu.:3597  
##  Median : 1707   Median :  353.0   Median : 9990   Median :4200  
##  Mean   : 3700   Mean   :  855.3   Mean   :10441   Mean   :4358  
##  3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925   3rd Qu.:5050  
##  Max.   :31643   Max.   :21836.0   Max.   :21700   Max.   :8124  
##                                                                  
##      Books           Personal         PhD            Terminal    
##  Min.   :  96.0   Min.   : 250   Min.   :  8.00   Min.   : 24.0  
##  1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00   1st Qu.: 71.0  
##  Median : 500.0   Median :1200   Median : 75.00   Median : 82.0  
##  Mean   : 549.4   Mean   :1341   Mean   : 72.66   Mean   : 79.7  
##  3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00   3rd Qu.: 92.0  
##  Max.   :2340.0   Max.   :6800   Max.   :103.00   Max.   :100.0  
##                                                                  
##    S.F.Ratio      perc.alumni        Expend        Grad.Rate     
##  Min.   : 2.50   Min.   : 0.00   Min.   : 3186   Min.   : 10.00  
##  1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751   1st Qu.: 53.00  
##  Median :13.60   Median :21.00   Median : 8377   Median : 65.00  
##  Mean   :14.09   Mean   :22.74   Mean   : 9660   Mean   : 65.46  
##  3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830   3rd Qu.: 78.00  
##  Max.   :39.80   Max.   :64.00   Max.   :56233   Max.   :118.00  
## 
table(mydataset$Private)
## 
##  No Yes 
## 212 565
Above summary is on the U.S. News and World Report’s College Data. This data set primarily consists of Statistics for a large number of US Colleges from the 1995 issue of US News and World Report. There are 777 observations for 18 variables. This data set has 565 private colleges and 212 Non-private colleges.Mean of the Application is 3002 where as mean of the actual acceptance is somewhat lower at 2019. Surprisingly, the enrollement mean (780) is way lower than the Acceptance mean.Main reason why I chose these dataset because as a higher education officer I can see the relationship between these variables.

Data Wrangling

# # Creating a subset out of the original table
newframe <- c(mydataset[ mydataset$Private == "No", c(1,3,4,5,15,19)])
newmydataset <- (data.frame(newframe))
head(newmydataset, 20)
##                                                X  Apps Accept Enroll
## 1                        Angelo State University  3540   2001   1016
## 2                   Appalachian State University  7313   4664   1910
## 3           Arizona State University Main campus 12809  10308   3761
## 4                       Arkansas Tech University  1734   1729    951
## 5                  Auburn University-Main Campus  7548   6791   3070
## 6                       Bemidji State University  1208    877    546
## 7               Bloomsburg Univ. of Pennsylvania  6773   3028   1025
## 8                 Bowling Green State University  9251   7333   3076
## 9                California Polytechnic-San Luis  7811   3817   1650
## 10         California State University at Fresno  4540   3294   1483
## 11                       Castleton State College  1257    940    363
## 12          Central Connecticut State University  4158   2532    902
## 13             Central Missouri State University  4681   4101   1436
## 14                 Central Washington University  2785   2011   1007
## 15                Christopher Newport University   883    766    428
## 16                            Clemson University  8065   5257   2301
## 17 Clinch Valley Coll. of  the Univ. of Virginia   689    561    250
## 18                         College of Charleston  4772   3140   1265
## 19                   College of William and Mary  7117   3106   1217
## 20                     Colorado State University  9478   6312   2194
##    Terminal Grad.Rate
## 1        62        34
## 2        96        70
## 3        93        48
## 4        60        48
## 5        91        69
## 6        62        46
## 7        68        75
## 8        89        67
## 9        81        59
## 10       90        61
## 11       91        79
## 12       73        49
## 13       80        50
## 14       89        51
## 15       82        48
## 16       88        73
## 17       67        46
## 18       78        51
## 19       92        93
## 20       89        59

New dataset includes only Non-Private Colleges (212) with some important variables such Applications, Acceptance, Enrollement, Terminals and Graduation Rate.

# rename columns
names(newmydataset)<-c("College Name", "Applications", "Acceptence", "Enrollment", "Terminations", "Graduate Rate")
head(newmydataset, 20)
##                                     College Name Applications Acceptence
## 1                        Angelo State University         3540       2001
## 2                   Appalachian State University         7313       4664
## 3           Arizona State University Main campus        12809      10308
## 4                       Arkansas Tech University         1734       1729
## 5                  Auburn University-Main Campus         7548       6791
## 6                       Bemidji State University         1208        877
## 7               Bloomsburg Univ. of Pennsylvania         6773       3028
## 8                 Bowling Green State University         9251       7333
## 9                California Polytechnic-San Luis         7811       3817
## 10         California State University at Fresno         4540       3294
## 11                       Castleton State College         1257        940
## 12          Central Connecticut State University         4158       2532
## 13             Central Missouri State University         4681       4101
## 14                 Central Washington University         2785       2011
## 15                Christopher Newport University          883        766
## 16                            Clemson University         8065       5257
## 17 Clinch Valley Coll. of  the Univ. of Virginia          689        561
## 18                         College of Charleston         4772       3140
## 19                   College of William and Mary         7117       3106
## 20                     Colorado State University         9478       6312
##    Enrollment Terminations Graduate Rate
## 1        1016           62            34
## 2        1910           96            70
## 3        3761           93            48
## 4         951           60            48
## 5        3070           91            69
## 6         546           62            46
## 7        1025           68            75
## 8        3076           89            67
## 9        1650           81            59
## 10       1483           90            61
## 11        363           91            79
## 12        902           73            49
## 13       1436           80            50
## 14       1007           89            51
## 15        428           82            48
## 16       2301           88            73
## 17        250           67            46
## 18       1265           78            51
## 19       1217           92            93
## 20       2194           89            59

Calculating the summary for the new dataset

summary(newmydataset)
##                                College Name  Applications  
##  Angelo State University             :  1   Min.   :  233  
##  Appalachian State University        :  1   1st Qu.: 2191  
##  Arizona State University Main campus:  1   Median : 4307  
##  Arkansas Tech University            :  1   Mean   : 5730  
##  Auburn University-Main Campus       :  1   3rd Qu.: 7722  
##  Bemidji State University            :  1   Max.   :48094  
##  (Other)                             :206                  
##    Acceptence      Enrollment      Terminations    Graduate Rate   
##  Min.   :  233   Min.   : 153.0   Min.   : 33.00   Min.   : 10.00  
##  1st Qu.: 1563   1st Qu.: 701.8   1st Qu.: 76.00   1st Qu.: 46.00  
##  Median : 2930   Median :1337.5   Median : 86.00   Median : 55.00  
##  Mean   : 3919   Mean   :1640.9   Mean   : 82.82   Mean   : 56.04  
##  3rd Qu.: 5264   3rd Qu.:2243.8   3rd Qu.: 92.00   3rd Qu.: 65.00  
##  Max.   :26330   Max.   :6392.0   Max.   :100.00   Max.   :100.00  
## 

Some more calculations using new dataset

# mean of applicaitons
mean(newmydataset$Applications)
## [1] 5729.92
# mean of Graduation Rate
mean(newmydataset$`Graduate Rate`)
## [1] 56.04245
# Non-Private colleges with Graduation Rate higher than 90
newsub <- subset(newmydataset, newmydataset$`Graduate Rate` > 90)
newsub
##                        College Name Applications Acceptence Enrollment
## 19      College of William and Mary         7117       3106       1217
## 42         James Madison University        11223       5285       2082
## 65  Missouri Southern State College         1576       1326        913
## 186          University of Virginia        15849       5384       2678
##     Terminations Graduate Rate
## 19            92            93
## 42            81            98
## 65            54           100
## 186           92            95

Graphics

plot(mydataset$F.Undergrad, mydataset$Room.Board, xlab = 'Full Time Grads', ylab = 'Room n. Board', main = 'Number of Full Time Grads Vs. Room N. Board Expenses',col = 'red')

hist<- hist(newmydataset$Enroll, freq = TRUE, xlab = 'Applications Range', ylab = "Enrollement", main = 'Applications Vs. Enrollement', col = 'lightgreen')

hist
## $breaks
##  [1]    0  500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500
## 
## $counts
##  [1] 37 42 37 35 18  9 18  6  1  3  1  3  2
## 
## $density
##  [1] 3.490566e-04 3.962264e-04 3.490566e-04 3.301887e-04 1.698113e-04
##  [6] 8.490566e-05 1.698113e-04 5.660377e-05 9.433962e-06 2.830189e-05
## [11] 9.433962e-06 2.830189e-05 1.886792e-05
## 
## $mids
##  [1]  250  750 1250 1750 2250 2750 3250 3750 4250 4750 5250 5750 6250
## 
## $xname
## [1] "newmydataset$Enroll"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"
# There are 4 Non-private colleges with a grad rate of 90 and higher. This box plot gives us a visual comparison of its Recruitment.
boxplot(newsub$Applications, newsub$Acceptence, newsub$Enrollment, newsub$Terminations, main = 'Non-Private colleges with Graduation Rate higher than 90', col = c("green", "purple", "yellow", "blue"),names = c("Applications","Acceptence","Enrollment","Terminations"), ylab = "Number of Students")

Data Visualization

# Applications Vs. Enrollment
library(ggplot2)
ggplot(data=newmydataset, aes(x=newmydataset$Applications, y=newmydataset$Enrollment))+geom_point(aes(newmydataset$Applications))+geom_smooth(method = 'lm')+coord_cartesian()+scale_color_gradient()

library(ggplot2)
library(rlang)
## Warning: package 'rlang' was built under R version 3.6.1
newframe2 <- c(mydataset[ mydataset$Private == "Yes", c(1,3,4,5,15,19)])
newmydataset2 <- (data.frame(newframe2))

ggplot(data = newmydataset2,aes(x=newmydataset2$Enroll, y=newmydataset2$Grad.Rate) )+geom_point(aes(newmydataset2$Enroll,newmydataset2$Grad.Rate))+stat_density2d()

Meaningful question for analysis

Compare two datasets of Private and Non-Private colleges for their summary. Which one has the highest Graduation Rate?

# Summary of Non-Private Colleges
summary(newmydataset)
##                                College Name  Applications  
##  Angelo State University             :  1   Min.   :  233  
##  Appalachian State University        :  1   1st Qu.: 2191  
##  Arizona State University Main campus:  1   Median : 4307  
##  Arkansas Tech University            :  1   Mean   : 5730  
##  Auburn University-Main Campus       :  1   3rd Qu.: 7722  
##  Bemidji State University            :  1   Max.   :48094  
##  (Other)                             :206                  
##    Acceptence      Enrollment      Terminations    Graduate Rate   
##  Min.   :  233   Min.   : 153.0   Min.   : 33.00   Min.   : 10.00  
##  1st Qu.: 1563   1st Qu.: 701.8   1st Qu.: 76.00   1st Qu.: 46.00  
##  Median : 2930   Median :1337.5   Median : 86.00   Median : 55.00  
##  Mean   : 3919   Mean   :1640.9   Mean   : 82.82   Mean   : 56.04  
##  3rd Qu.: 5264   3rd Qu.:2243.8   3rd Qu.: 92.00   3rd Qu.: 65.00  
##  Max.   :26330   Max.   :6392.0   Max.   :100.00   Max.   :100.00  
## 
# Summary of private colleges

summary(newmydataset2)
##                             X            Apps           Accept     
##  Abilene Christian University:  1   Min.   :   81   Min.   :   72  
##  Adelphi University          :  1   1st Qu.:  619   1st Qu.:  501  
##  Adrian College              :  1   Median : 1133   Median :  859  
##  Agnes Scott College         :  1   Mean   : 1978   Mean   : 1306  
##  Alaska Pacific University   :  1   3rd Qu.: 2186   3rd Qu.: 1580  
##  Albertson College           :  1   Max.   :20192   Max.   :13007  
##  (Other)                     :559                                  
##      Enroll          Terminal        Grad.Rate  
##  Min.   :  35.0   Min.   : 24.00   Min.   : 15  
##  1st Qu.: 206.0   1st Qu.: 68.00   1st Qu.: 58  
##  Median : 328.0   Median : 81.00   Median : 69  
##  Mean   : 456.9   Mean   : 78.53   Mean   : 69  
##  3rd Qu.: 520.0   3rd Qu.: 92.00   3rd Qu.: 81  
##  Max.   :4615.0   Max.   :100.00   Max.   :118  
## 

Answer: Mean Graduation Rate of NOn-Private Colleges is 56 where as it is 69 for private colleges. this shows that the Graduation Rate is higher in private colleges compared to Non-private colleges.We need keep in mind that we are looking at a big dataset in Private colleges compared to Non-Private.