\(\color{deepskyblue}{\text{ 0. Data Description}}\)

This is a research about almost all the colleges in the US.

The dataset is from ‘U.S News and World Report’s College Data’.

It’s from ‘the \(\color{red}{\text{ 1995}}\) issue of US News and World Report’ maintained at Carnegie Mellon University.

You can get the ‘college’ data from the ISLR package.

For the description : https://www.kaggle.com/flyingwombat/us-news-and-world-reports-college-data

\(\color{deepskyblue}{\text{ 1. Load Packages}}\)

library(ggplot2)
library(leaflet)
library(dplyr)
library(GGally)
library(rpart)
library(rpart.plot)

\(\color{deepskyblue}{\text{ 2. Looking at the Data}}\)

\(\color{deepskyblue}{\text{ 2-0. Glimpse}}\)

College=read.csv('US_college_data.csv')
glimpse(College)
Observations: 777
Variables: 19
$ X           <fct> Abilene Christian University, Adelphi University, ...
$ Private     <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, ...
$ Apps        <int> 1660, 2186, 1428, 417, 193, 587, 353, 1899, 1038, ...
$ Accept      <int> 1232, 1924, 1097, 349, 146, 479, 340, 1720, 839, 4...
$ Enroll      <int> 721, 512, 336, 137, 55, 158, 103, 489, 227, 172, 4...
$ Top10perc   <int> 23, 16, 22, 60, 16, 38, 17, 37, 30, 21, 37, 44, 38...
$ Top25perc   <int> 52, 29, 50, 89, 44, 62, 45, 68, 63, 44, 75, 77, 64...
$ F.Undergrad <int> 2885, 2683, 1036, 510, 249, 678, 416, 1594, 973, 7...
$ P.Undergrad <int> 537, 1227, 99, 63, 869, 41, 230, 32, 306, 78, 110,...
$ Outstate    <int> 7440, 12280, 11250, 12960, 7560, 13500, 13290, 138...
$ Room.Board  <int> 3300, 6450, 3750, 5450, 4120, 3335, 5720, 4826, 44...
$ Books       <int> 450, 750, 400, 450, 800, 500, 500, 450, 300, 660, ...
$ Personal    <int> 2200, 1500, 1165, 875, 1500, 675, 1500, 850, 500, ...
$ PhD         <int> 70, 29, 53, 92, 76, 67, 90, 89, 79, 40, 82, 73, 60...
$ Terminal    <int> 78, 30, 66, 97, 72, 73, 93, 100, 84, 41, 88, 91, 8...
$ S.F.Ratio   <dbl> 18.1, 12.2, 12.9, 7.7, 11.9, 9.4, 11.5, 13.7, 11.3...
$ perc.alumni <int> 12, 16, 30, 37, 2, 11, 26, 37, 23, 15, 31, 41, 21,...
$ Expend      <int> 7041, 10527, 8735, 19016, 10922, 9727, 8861, 11487...
$ Grad.Rate   <int> 60, 56, 54, 59, 15, 55, 63, 73, 80, 52, 73, 76, 74...
colnames(College)[1] = 'Name'  #Change a column name
College=College%>%mutate(Accept_Rate=Accept/Apps*100) #Add a column of Acceptance Rates
College=College%>%mutate(Enroll_Rate=Enroll/Accept*100) #Add a column of Acceptance Rates

\(\color{deepskyblue}{\text{ 2-1. Good Students}}\)

  • In this data, we can say a university has good students if
    1. It has a low acceptance rate
    1. Their students had a great high school grade.
table(College$Private)

 No Yes 
212 565 
Top10perc_top= arrange(College,desc(Top10perc)) %>% head(100); table(Top10perc_top$Private)

 No Yes 
 19  81 
Top10perc_bott=arrange(College,Top10perc)       %>% head(100); table(Top10perc_bott$Private)

 No Yes 
 42  58 

Say a University has great students if their students were good at high schools.

I used the column ‘Top10perc’. I didn’t use ‘Top25perc’ which is redundant.

I found that students with great high school grades relatively go more to private schools.

However, students with great high school grades relatively go less to public schools.

Top10perc_top[,c('Name','Accept_Rate')] 
                                           Name Accept_Rate
1         Massachusetts Institute of Technology    33.38013
2                           Harvey Mudd College    41.53958
3          University of California at Berkeley    41.52368
4                               Yale University    22.91453
5                               Duke University    28.23265
6                            Harvard University    15.61486
7                          Princeton University    15.44863
8               Georgia Institute of Technology    57.76445
9                              Brown University    25.73494
10                            Dartmouth College    26.47025
11                        Pepperdine University    53.31065
12                      Northwestern University    42.31426
13           University of California at Irvine    68.63932
14                   University of Pennsylvania    42.21397
15                              Amherst College    23.05904
16                             Williams College    29.74200
17                            Wellesley College    43.14335
18                     University of Notre Dame    48.05195
19                          Columbia University    28.56720
20                             Davidson College    40.28656
21                              Bowdoin College    30.36353
22                             Emory University    49.00071
23                             Carleton College    58.61173
24                     Johns Hopkins University    40.66557
25  University of North Carolina at Chapel Hill    41.00438
26                       Wake Forest University    42.25402
27                       University of Virginia    33.97060
28                            Bryn Mawr College    55.29010
29              Case Western Reserve University    81.40315
30                    Claremont McKenna College    41.23656
31                        Georgetown University    25.91993
32                        Vanderbilt University    60.19766
33                    College of the Holy Cross    56.47963
34                          New York University    53.28822
35                  College of William and Mary    43.64198
36                        University of Chicago    47.24323
37                Washington and Lee University    33.06184
38                  Birmingham-Southern College    73.04348
39          University of Michigan at Ann Arbor    67.56475
40                           Niagara University    80.90090
41                           Trinity University    74.96907
42                        Washington University    68.70917
43                          Agnes Scott College    83.69305
44                   Carnegie Mellon University    59.58983
45                              Scripps College    73.91813
46                           SUNY at Binghamton    42.63293
47                          Wesleyan University    41.34535
48                                Colby College    46.31320
49                       Polytechnic University    74.82332
50                               Rhodes College    79.53953
51                           Grove City College    44.56042
52                       Sarah Lawrence College    55.65217
53                      Transylvania University    96.04743
54                             Colorado College    49.17368
55                              Drew University    73.38597
56                            Furman University    90.28228
57                             Grinnell College    68.12163
58                           Macalester College    50.90167
59                        Oglethorpe University    81.94444
60                      SUNY College at Geneseo    53.05885
61            University of Minnesota at Morris    59.94513
62                      University of Rochester    62.71960
63                           Wheaton College IL    64.24581
64                               Centre College    87.66041
65                 Illinois Wesleyan University    44.00000
66                        Trenton State College    45.85482
67                        University of Florida    71.00040
68                              Barnard College    56.16987
69             Rensselaer Polytechnic Institute    83.36669
70                               Vassar College    52.87324
71                              Hendrix College    87.60632
72                           Occidental College    56.75559
73              University of Illinois - Urbana    77.99719
74                       Fresno Pacific College    79.19075
75                                Smith College    54.63248
76              Stevens Institute of Technology    70.64480
77                    University of Puget Sound    69.88131
78                              Wofford College    62.29181
79                                 Bard College    43.87435
80                            DePauw University    83.04915
81                     Florida State University    74.52579
82                          Lawrence University    76.18665
83                              Oberlin College    57.91126
84                          Bucknell University    58.23152
85           Texas A&M Univ. at College Station    72.67514
86                             Union College NY    48.98426
87              University of Missouri at Rolla    97.28290
88                        Valparaiso University    83.22892
89                        Willamette University    80.03619
90              Worcester Polytechnic Institute    83.59827
91                          Brandeis University    65.52795
92            Brigham Young University at Provo    73.34691
93                                 Knox College    81.25000
94                             Linfield College    76.10506
95         Pennsylvania State Univ. Main Campus    53.55423
96                University of Texas at Austin    64.88612
97              California Polytechnic-San Luis    48.86698
98                              Erskine College    84.52200
99                        Mount Holyoke College    73.00000
100                                Reed College    73.04171
Top10perc_bott[,'Accept_Rate']
  [1]  65.89018  73.12715  64.29942  82.76644  81.44192  86.74972  86.19626
  [8]  32.83333  80.31212  85.42141  69.35484  91.38889  68.95767  74.49210
 [15]  72.55507  78.37232 100.00000  75.03817  86.38743  59.65078  86.86391
 [22]  72.27926  93.47826  51.07143  60.89466  59.75309  93.83178  91.91919
 [29]  45.83333  79.75207  44.61207 100.00000  84.93243  80.51282  84.85222
 [36]  85.02857  48.86208  78.44828  89.15152  60.06662  82.47078  72.20826
 [43]  83.90805  92.89117  93.23006  89.24731  66.09628  89.65952  76.97183
 [50]  74.78123  89.23837  81.99802  61.71896  86.53516  86.95086  68.41004
 [57]  64.57019  79.33884  75.81951  85.51532  90.00000  87.60949  50.31847
 [64]  95.63492  80.73555  94.93088  89.97361  73.93365  81.25000  90.17981
 [71]  77.12245  84.13147  48.12310  72.84345  82.64642  88.91967  73.84418
 [78]  82.04038  84.79087  71.46933  92.88981  84.86683  83.25359  90.30172
 [85]  71.02473  70.73171  82.64642  77.59187  88.43683  68.17496  68.66894
 [92]  71.68079  82.34694  82.08232  76.49402  76.74419  76.87638  69.76190
 [99]  77.68496  70.00000
Top10perc_top[,c('Name','Enroll_Rate')]
                                           Name Enroll_Rate
1         Massachusetts Institute of Technology    50.37383
2                           Harvey Mudd College    31.11888
3          University of California at Berkeley    38.96025
4                               Yale University    53.68936
5                               Duke University    40.66273
6                            Harvard University    74.18014
7                          Princeton University    56.46425
8               Georgia Institute of Technology    50.27612
9                              Brown University    45.13739
10                            Dartmouth College    47.82226
11                        Pepperdine University    33.38243
12                      Northwestern University    36.57692
13           University of California at Irvine    22.99768
14                   University of Pennsylvania    47.09480
15                              Amherst College    42.13710
16                             Williams College    42.24900
17                            Wellesley College    46.35709
18                     University of Notre Dame    51.51351
19                          Columbia University    45.12953
20                             Davidson College    47.28033
21                              Bowdoin College    41.02061
22                             Emory University    29.65451
23                             Carleton College    30.96897
24                     Johns Hopkins University    26.43645
25  University of North Carolina at Chapel Hill    55.65581
26                       Wake Forest University    37.75084
27                       University of Virginia    49.73997
28                            Bryn Mawr College    38.64198
29              Case Western Reserve University    22.59189
30                    Claremont McKenna College    29.59583
31                        Georgetown University    48.24714
32                        Vanderbilt University    31.96162
33                    College of the Holy Cross    38.97102
34                          New York University    34.58034
35                  College of William and Mary    39.18223
36                        University of Chicago    30.74358
37                Washington and Lee University    38.77737
38                  Birmingham-Southern College    48.80952
39          University of Michigan at Ann Arbor    37.81298
40                           Niagara University    26.00223
41                           Trinity University    33.05831
42                        Washington University    23.84484
43                          Agnes Scott College    39.25501
44                   Carnegie Mellon University    22.89944
45                              Scripps College    21.99367
46                           SUNY at Binghamton    28.49497
47                          Wesleyan University    36.08718
48                                Colby College    34.57165
49                       Polytechnic University    35.65525
50                               Rhodes College    21.35445
51                           Grove City College    51.62162
52                       Sarah Lawrence College    34.24479
53                      Transylvania University    33.47051
54                             Colorado College    31.07166
55                              Drew University    20.31646
56                            Furman University    35.11020
57                             Grinnell College    31.10151
58                           Macalester College    30.21390
59                        Oglethorpe University    28.65948
60                      SUNY College at Geneseo    25.05480
61            University of Minnesota at Morris    67.27689
62                      University of Rochester    22.60822
63                           Wheaton College IL    59.56522
64                               Centre College    32.43243
65                 Illinois Wesleyan University    35.09687
66                        Trenton State College    40.83045
67                        University of Florida    41.00272
68                              Barnard College    37.87447
69             Rensselaer Polytechnic Institute    22.47299
70                               Vassar College    34.78956
71                              Hendrix College    38.00277
72                           Occidental College    28.05155
73              University of Illinois - Urbana    48.96155
74                       Fresno Pacific College    53.28467
75                                Smith College    39.54944
76              Stevens Institute of Technology    30.42434
77                    University of Puget Sound    24.34536
78                              Wofford College    29.19786
79                                 Bard College    34.00955
80                            DePauw University    29.89130
81                     Florida State University    34.81516
82                          Lawrence University    34.21331
83                              Oberlin College    24.50307
84                          Bucknell University    22.60687
85           Texas A&M Univ. at College Station    60.76623
86                             Union College NY    30.84112
87              University of Missouri at Rolla    45.07119
88                        Valparaiso University    30.11002
89                        Willamette University    29.76639
90              Worcester Polytechnic Institute    29.47277
91                          Brandeis University    26.97776
92            Brigham Young University at Provo    85.43132
93                                 Knox College    33.84615
94                             Linfield College    38.55219
95         Pennsylvania State Univ. Main Campus    33.35267
96                University of Texas at Austin    55.67280
97              California Polytechnic-San Luis    43.22767
98                              Erskine College    29.98205
99                        Mount Holyoke College    40.03044
100                                Reed College    22.77159
Top10perc_bott[,'Enroll_Rate'] 
  [1] 51.262626 42.481203 23.880597 45.205479 28.852459 55.874674  9.975397
  [8] 62.944162 41.704036 24.400000 38.372093 32.826748 35.841584 45.757576
 [15] 45.021251 33.605887 65.665236 32.146490 38.181818 30.854430 34.604905
 [22] 41.193182 61.627907 55.244755 35.624013 73.553719 44.422311 61.195055
 [29] 37.826541 36.528497 29.440182 52.731245 35.958632 31.528662 28.301887
 [36] 27.822581 25.364759 54.395604 27.804215 37.215034 45.344130 50.074590
 [43] 60.273973 49.585799 38.523077 41.824441 29.160740 60.571964 20.128088
 [50] 38.617021 15.351005 49.457177 52.883383 50.545094 57.896707 35.779817
 [57] 23.806866 46.093750 52.390641 97.882736 38.888889 35.015850 83.544304
 [64] 38.381743 37.744035 77.427184 77.712610 60.683761 35.627530 55.368098
 [71] 35.967140 66.099559 48.559382 60.087719 45.669291 57.632399 60.463768
 [78] 27.720207 46.188341 91.417166 47.493286 24.215407 58.106473 43.675418
 [85] 48.258706 35.303777 61.679790 38.737004 48.829701 42.477876 18.687873
 [92] 41.083744 43.370508 37.659784 27.083333 36.742424 30.796841 31.740614
 [99] 24.423963 68.877551

This shows that Top 100 Universities has relatively low acceptance rate.

The mean of the Acceptance Rate for the top 100 Universities was 57.9%, 20%p lower than that of the bottom 100 Universities.

However, Enrollment Rate doesn’t tell us about if Universities are so called ‘High Ranks’

\(\color{deepskyblue}{\text{ 2-2. Expenditures}}\)

  • Outstate : Out of state tuition
  • Room.Board : Room and board costs
  • Books : Estimated book costs
  • Personal : Estimated personal spending

information from : https://www.kaggle.com/flyingwombat/us-news-and-world-reports-college-data

ggpairs(College[,c(6,10,11,12,13)])

plot(College$Private,College$Outstate)

plot(College$Private,College$Room)

Universities with great high school grade had high Outstate tuition. Universities with high outstate tuition had very high room and board costs. Correlation between ‘Top10perc’ and ‘Room.Board’ is mild (0.371)

Private Schools have a distinctly high Outstate tuition. Private Schools have a relatively high Room and Board costs(mild).

However, book costs and personal spending is not correlated with other variables.

plot(College$Private,College$Books)

plot(College$Private,College$Personal)

plot(College$Books,College$PHD)

plot(College$Personal,College$PHD)

Further, the Book cost and Personal Spending has low correlation with all the other variables in the dataset.

\(\color{deepskyblue}{\text{ 2-3. Infrastructure and Expenditure per student}}\)

  • S.F.Ratio : Student/faculty Ratio
  • Expend : Instructional Expenditure per student
qplot(-(College$S.F.Ratio),College$Expend)

Student/Faculty Ratio is negatively correlated with Expenditure put on students.

plot(College$Private,College$Expend)

plot(College$Private,College$S.F.Ratio)

t.test(College$S.F.Ratio~College$Private) 

    Welch Two Sample t-test

data:  College$S.F.Ratio by College$Private
t = 15.111, df = 389.14, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.648025 4.739303
sample estimates:
 mean in group No mean in group Yes 
         17.13915          12.94549 

Whether a school is private doesn’t have a relationship with expenditures.

However, Public schools have higher Student/Facility Ratio.

To be sure that the difference is distince, I conducted a T-Test assuming the Student/Facility Ratio follows a normal distribution. ### According to the Test, the difference is distinct.

Then maybe Universities with lower S.F Ratio has high proportion of students that were good at high schools.

qplot(College$S.F.Ratio,College$Top10perc)

cor(College$S.F.Ratio , College$Top10perc)
[1] -0.3848745

As we look at the graph and get the correlation coefficient, these two variables are less correlated than expected (-0.38 : negatively moderate)

\(\color{deepskyblue}{\text{ 2-4. Graduation}}\)

  • Grad.Rate : Graduation Rate
plot(College$Private,College$Grad.Rate)

It’s quite hard to tell if the true mean is different from each other just looking at a boxplot.

qplot(data=College,x=Grad.Rate,fill=Private,col=Private)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Looking at the histogram, we can figure out that public schools have lower graduation rate than private schools

\(\color{deepskyblue}{\text{ 3. Classification - Decision Tree}}\)

I’ll draw a classification tree with meaningful variables.

I referred to my previous project

https://rpubs.com/Damian/506438

College_class = College[,c('Private','Top10perc','Outstate','Room.Board','PhD','S.F.Ratio','Expend','Grad.Rate','Accept_Rate','Enroll_Rate')]  #get rid of meaningless variables
set.seed(1995)
sample=sample(1:nrow(College_class),replace=F) #Random order
College_class2=College_class[order(sample),]
head(College_class2)
    Private Top10perc Outstate Room.Board PhD S.F.Ratio Expend Grad.Rate
249     Yes        35     5504       3528  71      17.7   6466        73
84      Yes        16    10750       5340  90      14.6   7972        64
37      Yes        50    19264       6206  98      10.4  13894        79
401     Yes        30    16975       4565  76      12.8  10888        83
474     Yes        16    13250       5420  84      12.3  11299        70
485      No        27     7411       4748  90      18.6  10134        57
    Accept_Rate Enroll_Rate
249    62.05694    75.46816
84     76.86646    34.33653
37     43.87435    34.00955
401    72.88607    25.16340
474    67.59621    30.11551
485    52.04991    13.24201
anov=rpart(Private~.,data=College_class2,method='class')
rpart.plot(anov,type=3,digits=2,fallen.leaves = T,main='Private')

anov2=rpart(Top10perc~.,data=College_class2,method='anova')
rpart.plot(anov2,type=3,digits=2,fallen.leaves = T,main='Top 10 Percent')

anov3=rpart(Accept_Rate~.,data=College_class2,method='anova')
rpart.plot(anov3,type=3,digits=2,fallen.leaves = T,main='Acceptance Rate')

\(\color{deepskyblue}{\text{ 4. Plotting using Leaflet}}\)

College$Top10perc_top
NULL
plot=leaflet()%>%
  addTiles()
plot=plot%>%
  addMarkers(lat=42.3601, lng=-71.0942,popup='1.MIT')%>%
  addMarkers(lat=34.1061, lng=-117.7105,popup='2.Harvey Mudd')%>%
  addMarkers(lat=37.8719, lng=-122.2585,popup='3.UCBerkeley')%>%
  addMarkers(lat=41.3163, lng=-72.9223,popup='4.Yale')%>%
  addMarkers(lat=36.0014, lng=-78.9382,popup='5.Duke')%>%
  addMarkers(lat=42.377, lng=-71.1167, popup='6.Harvard')%>%
  addMarkers(lat=40.3573, lng=-74.6672, popup='7.Princeton')%>%
  addMarkers(lat=33.7756, lng=-84.3963, popup='8.GeorgiaTech')%>%
  addMarkers(lat=42.8268, lng=-71.4025, popup='9.Brown')%>%
  addMarkers(lat=43.7044, lng=-72.2887, popup='10.Dartmouth')%>%
  addMarkers(lat=34.0414,lng=-118.7096,popup='11.Pepperdine')%>%
  addMarkers(lat=33.6405,lng=-117.8443,popup='13.UCIrvine')%>%
  addMarkers(lat=39.9522, lng=-75.1932,popup='14.UPenn')%>%
  addMarkers(lat=42.3709, lng=-72.5170,popup='15.Amherst College')%>%
  addMarkers(lat=42.7130,lng= -73.2036,popup='16.Williams College')%>%
  addMarkers(lat=42.2936,lng=-71.3059, popup='17.Wellesley College')%>%
  addMarkers(lat=41.7056,lng=-86.2353,popup='18.Notre Dame')%>%
  addMarkers(lat=40.8075,lng=-73.9626, popup='19.Columbia')%>%
  addMarkers(lat=35.5008,lng=-80.8447,popup='20.Davidson')%>%
  addMarkers(lat=43.9077,lng=-69.9640,popup='21.Bowdain')%>%
  addMarkers(lat=33.7971,lng=-84.3222,popup='22.Emory') %>% 
  addMarkers(lat=44.4614,lng=-93.1558,popup='23.Carleton')%>%
  addMarkers(lat=39.3299,lng=-76.6205,popup='24.JohnsHopkins')%>%
  addMarkers(lat=35.9049,lng=-79.0469,popup='25.Chapel Hill') %>%
  addMarkers(lat=36.1341,lng=-80.2779,popup='26.WakeForest')%>%
  addMarkers(lat=38.0336,lng=-78.5080,popup='27.Virginia')
plot

Looking at the plot, you can see that colleges that has high proportion of students good at high schools are mostly private colleges in the East Side of the country (IvyLeague)